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Final  Report  on  the  International  Workshop 
on  High-level  Language  Computer  Architecture 


Reported  by  Yaohan  Chu 
June  30,  1980 


This  is  the  final  report  for  the  International  Workshop  on  HLLCA. 

This  Workshop  is  made  possible  by  the  partial  support  from  the  ONR.  The 
details  of  the  Workshop  are  reported  below. 

1.  Summary  of  the  Grant 

Title:  International  Workshop  on  High-level  Language  Computer  Architecture 

Period:  7/1/79  -  6/30/80 

Grant  no.:  N00014-79-C-0604 

Grant  Amount:  $9,860.00 

Principal  Investigator:  Professor  Yaohan  Chu 

Department  of  Computer  Science 
University  of  Maryland 
College  Park,  MD  20742 
301-454-4245 

2.  Workshop 

Date: 

Location: 

No.  of  Registrants: 

Programs: 

Proceedings: 

3.  Organization 

The  workshop  is  organized  by  the  Workshop  Committee.  There  are 
four  members  on  the  Workshop  Committee;  the  names  are  shown  in  Appendix  C.^ 

The  technical  program  is  organized  by  the  Program  Committee  whose 
chairman  is  Dr.  Yaohan  Chu.  There  are  17  members;  the  names  of  these  members 
are  also  shown  in  Appendix  C.  There  are  26  papers  in  8  sessions  in  addition 
to  a  pannel  discussion  session.  The  details  of  this  program  are  shown  in 
Appendix  C. 

The  tutorial  program  is  organized  by  Dr.  Keith  Doty.  There  are  5 
lecturers;  each  provides  a  set  of  notes.  The  names  of  the  lecturers  are 
shown  in  Appendix  C.  The  other  working  members  of  the  workshop  are  also 
shown  in  Appendix  C. 

The  Workshop  Committee  approved  the  travel  allownaces  for  4 
international  participants  who  presented  a  paper  as  a  minimum  requirement. 

These  names  are  shown  below. 


May  26-28,  1980 
Fort  Lauderdale,  FI. 
Technical  program: 
Tutorial  program: 

See  Appendix  C  — - - - 
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(1)  Professor  Yoong-Nien  Chen 
Department  of  Comptuers 
University  of  Science  and  Technology 
City  of  Hefei,  Province  of  Anhui, 

The  People’s  Republic  of  China 
Amount:  $1,000. 

(2)  Dr.  Masahiro  Yamamoto 
Central  Research  Laboratory 
Nippon  Electric  Company,  Ltd. 

Japan 

Amount:  $750. 

(3)  Dr.  Esen  A.  Ozkarahan 

Middle  East  Technical  University 
Ankara,  Turkey  ' 

Amount:  $500 

(4)  Mr.  J.P.  Sansonnet 
Universite  de  Paul  Sabatier 
Toulouse,  France 

Amount  $500. 


4.  Next  Workshop 


The  Workshop  Committee  met  on  May  28,  1980  and  decided  to  have 
another  workshop  because  of  the  attendance  beyond  expectation.  The  following 
are  decided. 


Date:  May  17-20  1982 

Location:  Fort  Lauderdale 
Program  Chairman:  Dr.  Lee  Hoevel 
Program  Vice  Chairman:  Dr.  George  Ligler 


5.  International  Participation 


The  Workshop  is  truly  international  as  there  were  participants  from 
12  countries:  Brazil,  Canada,  China,  France,  Ireland,  Italy,  Japan,  Sweden, 
Turkey,  United  Kingdom,  U.S.A.,  West  Germany. 
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6.  Official  Reports  Distribution  List 


Oefense  Documentation  Center 
Cameron  Station 
Alexandria,  V A  22314 

Office  of  Naval  Research 
Arlington,  V A  22217 

Information  Systems  Program  (437) 
Code  200 
Code  455 
Code  453 

Office  of  Naval  Research 
Branch  Office,  Boston 
Bldg  114,  Section  D 
666  Summer  Street 
Boston,  MA  02210 

Office  of  Naval  Research 
Branch  Office,  Chicago 
536  South  Clark  Street 
Chicago,  IL  60605 

Office  of  Naval  Research 
Branch  Office,  Pasadena 
1030  East  Green  Street 
Pasadena,  CA  91106 

Naval  Research  Laboratory 

Technical  Information  Division,  Code  2627 

Washington,  D.C.  20375 

Dr.  A.  L.  Slafkosky 
Scientific  Advisor 

Commandant  of  the  Marine  Corps  (Code  RD-1) 
Washington,  D.C.  20308 

Naval  Ocean  Systems  Center 
Advanced  Software  Technology ’Division 
Code  5200 

San  Diego,  CA  92152 
Mr.  C.  H.  Gleissner 

Naval  Ship  Research  A  Development  Center 
Computation  and  Mathematics  Department 
Bethesda,  MD  20084 

Captain  Grace  M.  Hopper  (008) 

Naval  Data  Automation  Commend 
Washington  Navy  Yard 
Building  166 
Washington,  D.C.  20374 


12  copies  — ~ 
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1  copy 
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1  copy 

1  copy 


1  copy 


1  copy 
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1  copy 


1  copy 
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Leon  S .  Levy 

Bell  Telephone  Laboratories 
Whippany,  NJ  07981 
(201)  386-4955 


Tim  Merrigan 
Floating  Point  Systems 
P.0.  Box  23489 
Portland,  OR.  97223 
(503)  641-3151 


Hartmut  G.  Huber 

Naval  Surface  Weapon  Center 

Box  117 

Dahlgren,  Va.  22448 
(703)  663-8656(of f ice) 

(703)  775-7 04 6 (home) 

N.R.  Harris 

Stanford  University 

Computer  Systems  Lab 

Department  of  Electrical  Engineering 

Stanford  CA  94305 

(415)  497-3511 

Mary  Miller 

Bell  Laboratories 

30W062  Capistrano  Ct.  Apt.  302 

Naperville  IL  60540 

(312)  462-4269  (office) 

John  J.  Zaloudek 
Naval  Surface  Weapons  Center 
Dahlgren,  Va.  22401 
(703)  663-7368 

E.  Dean  Earnest 
Burroughs  Corporation 
25725  Jeronimo  Rd. 

Mission  Viejo,  CA  92691 
(714)  768-2321 

Heinz  Schlutter 

Gesellschaft  fur  Matheraatik  und 
Datenverarbeitung,  MBH 
Postfach  1240 
Schloss  Birlinhoven 
D-5205  St.,  Augustin  1 
Bonn,  West  Germany 

Dr.  Klaus  Berkling 

(same  address  as  Schlutter) 

Giorgio  Sofi 
CSELT,  VIA.  REISS  R0H0LI 
Torino ,  Italy  10129 
tele.  21691 


Re in hard  G.  Kofer 
Siemens  AG,  ZFE-FL-SAR  112 
Otto  Kahn  Ring  6 
8  Muenchen  83  West  Germany 

Richard  C.  Fleming 
The  Aerospace  Corperation 
M.S.  A2/2043 
P.0.  Box  92957 
Los  Angeles  CA  90009 
(213)  648-7098 

Dr.  G.  U.  Merckel 
IBM  Dept.  24k  Bldg  032-3 
2000  NW  51  Street 
Boca  Raton'  FL.  33432 
(305)  994-47  63 

Melvin  Hallerman 
IBM  Dept.  24K  bldg  632-3 
2000  NW  51  Street 
Boca  Raton  FL.  33432 

Kerry  V.  Richmond 

McDonnell  Douglas  Astronautics  Co. 

P.0.  Box  516 

St.  Louis,  M0.  63166 

James  D.  Mooney 
West  Virginia  University 
Dept.  STAT.  &  COMP.  Science 
Morgantown,  WY.  26506 
(304  )  293-3  607 

Meir  Kaftor  M/S  B100 
Honeywell  Information  Systems 
P . 0 .  Box  6000 
Phoenix,  AZ.  85005 
(602)  866-3381 

Nobuyuki  Goto 
Toshiba  Corporation 
I  Komukai-Toshiba-cho,  Saiwai-ku 
Kawasaki,  Japan  210 
(044)  511-2111 
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List  of  Registrants  (Technical  Program) 


Jack  B.  Dennis 

MIT  Lab  for  Computer  Science 
545  Main  Street 
Cambridge ,  MA.  02139 
(617)  253-6856 


Harvey  G.  Cragon 
Texas  Instruments,  Inc. 
P.O.Box  225012 
Dallas,  TX  75265 
(214)  238-3023 


Mou-Shin  Yang 
Sustems  Emgineering 
6901  W.  Sunrise  Blvd, 

Ft.  Lauderdale  Fla.  33313 
(305)  587-2900  X6236 

Gilgert  J.  Hansen 
Texas  Instruments 
P.0.  Bax  222013,  MS  3407 
Dallas,  TX.  75222 
(214)  462-  4742 

Daniel  L.  Slotnick 
University  of  Illinois 
283  Digital  Computer  Lab 
Dept,  of  Computer  Science 
(217)  333-6726 

Terry  Welch 
Sperry  Research 
100  North  Rd. 

Sudbury,  MA.  01776 
(617)  369-4000 

Samuel  P.  Har bison 
Carnegle-_JMellon  University 
602A  Kelly  Ave 
Pittsburgh,  Pa.  15221 
(412)  731-  1472 

Charles  W.  Flink  II 

Naval  Surface  Weapon  Center 

K-74 

Dahlgren,  Va.  22401 
(703)  663-7517 

Bill  Kwinn 
Hewlett  Packard 
3404  E.  Harmony  Road 
Fort  Collins,  CO  80525 
(303)  226- 3800  X3242 

Jaishanker  Menon 
Dept  of  Computer  Science 
Ohio  State  University 
Columbus  Ohio  43210 
(614)  422-5813 


Leon  I.  Maissel 
IBM  Corp 

Dept.  C14,  Bldg  704, 

P.O.Box  390 
Poughkeepsie,  NY  12602 
(914)  463-2301 

v 

Raymond  L.  Phoenix 
IBM  Corp 

Dept.  C14,  Bldg  704, 

P.O.Box  390 
Poughkeepsie,  NY  12602 
(914)  463-5445 

Zvi  Weiss 

IBM  Research  Center 
Yorktown  Heights,  NY  10598 
(914)  962-7036 

Richard  Ramseyer 
Honeywell  SRC  Research 
2600  Ridgway  Pkwy,  MN17-2352 
Minneapolis,  MN  55413 
(  )  378-5023 

Tetstio  Ida 

Institute  of  Physical  &  Chen.  Res. 
2-1,  Hirosawa, 

Wako-shi,  Saltama  351 
Japan 

Greg  Bettice 
Naval  Vaionics  Center 
8125  Harrison  Drive 
Lawrence,  IN  46226 
(317)  353-3226 

Roger  R.  Bate 

Texas  Instruments,  Inc. 

P.O.Box  222013,  M/S  3407 
Dallas,  TX  75222 
(214)  462-4790 

Ron  Rutledge 
DOT/TSC,  P.O.Box  53 
Kendall  Square 
Cambridge,  MA  02142 
(617)  494-2038 
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List  of  Registrants  (Technical  Program) 


Gerhard  Herr sc her 
LITE? 

Loerracher  Strasae  IQ 
7800  Freiburg 
West  Germany 
0761-4901212 

A.  Speckhard 
Aerospace  Corperation 
2350  E.  El  Segundo  Blvd. 

El  Segvmdo  CA  90245 
(213)  648-7067 

John  Francis  „ 

Sanders  Associates,  Inc. 

95  Canal  Street 
Nashua  NH  03060 

(603)  885-3746 

Paula  Bernstein 
Bell  Laboratories 
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Naperville  IL  60540 
(312)  462-2898 

R. F.  Hobson 

Simon  Fraser  University 

S. F.  University 

Computer  Science  Department 
Burnaby  British  Columbia  VSAIS6 
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Dr.  Werner  Kluge 
GMD/ISF 
Postfach  1240 
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West  Germany 
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Da  tamed lx,  Inc. 
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(305)  428-4526 

Ronald  L.  Engelbrecht 
NCR  Corp.  -  E&M-Wichita 
3718  N.  Rock  Road 
Wichita  XS  67218 
(316)  688-8646 

Dr.  F.J.  Burkowski 

Computer  Science  Department 

University  of  Manitoba 

Room  545  Machray  Hall 

Winnipeg  Manitoba,  Canada  R3T  2N2 

(204)  47408313 


Allen  Naum 

Hewlett-Packard 
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1501  Page  Mill  Rd. 
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Tektronix,  Inc. 

P.O.  Box. 500  DS  63-311 
Beaverton  OR  97077 
(503)  682-3411  x3081 

R.  Curtis 
Canlsius  College 
2011  .-Main  Street 
Buffalo  NY  14208 
Ol*'  831-7000 

John  Bevies 

NCR  Corporation 

3325  Platt  Springs  Rd. 

C.  Columbia  SC  29169 
(803)  796-9250  x524 

David  M.  Abraham  son 
Department  of  Computer  Science 
Trinity  College 
Dublin  2  Ireland 
772941  Ext.  1765 

Hugh  L.  Applewhite 
Honeywell  17-2352 
2600  Ridgway  N.E. 

Minneapolis,  MN  55413 
(612)  378-4510 
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PoBg-eheng  Wang 
Ohio  State  University 
2036  Neil  Avenue 
Columbus,  Ohio  1*3202 

(61 h)  1*22-8039 
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Siemans  AG-Bereich  Syatemteahniacha 
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amter-S<^aro  wsky-Strsb  s  2 
(09131)  7-6W3 
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212-U60-7267 
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IBM  T.  J.  Watson  Research  Center 
P.0.  Box  218 

Yorktown  Heights,  N.Y.  10598 
(914)  945-1285 

Hartmut  G.  Huber 

Naval  Surface  Weapon  Center 

Box  117 

Dahlgren,  Va.  22448 
(703)  663-8656(of f ice) 

(703)  77 5-7 046 (home) 

Richard  C.  Fleming 

The  Aerospace  Corporation 

M.S.  A2/2043 

P.0.  Box  92957 

Los -Angela®,  -CA— 90009 
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IBM  Corporation 

P.0.  Box  1328,  Dept.  25T  032-1 
Boca  Raton,  Florida  33432 
(305)  994-3458 
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Dahlgren,  Va.  22401 
(703)  663-7368 

E.  Dean  Earnest 
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25725  Jeronimo  Rd. 

Mission  Viejo,  CA  92691 
(714)  768-2321 

Heinz  Schlutter 

Gesellschaft  fur  Mathematik  und 
Datenverarbeitung,  MBH 
Postfach  1240 
Schloss  Birlinhoven 
D-5205  St.,  Augustin  1 
^onn,  West  Germany 

Dr.  Klaus  Berkling 

(same  address  as  Schlutter) 


Tim  Merrigan 
Floating  Point  Systems 
P.0.  Box  23489 
Portland,  OR.  97223 
(503)  641-3151 

Reinhard  G.  Kofer 
Siemens  AG,  ZFE-FL-SAR  112 
Otto  Hahn  Ring  6 
8  Muenchen  83  West  Germany 

Mr.  Lucas  Moscato 
No  Address 
Country:  Brazil 


Dr.  G.  U.  Merckel 
IBM  Dept.  24k  Bldg  032-3 
2000  NW  51  Street 
Boca  Raton  FT.,.  33432 
(305)  994-4763 

Melvin  Hallerman 
IBM  Dept.  24K  bldg  032-3 
2000  NW  51  Street 
Boca  Raton  FL.  33432 

Kerry  V,  Richmond 

McDonnell  Douglas  Astronautics  Co. 

P.0.  Box  516 

St.  Louis,  M0.  63166 

James  D.  Mooney 
West  Virginia  University 
Dept.  STAT.  &  COMP.  Science 
Morgantown,  WY.  26506 
(304  )  293  -3  607 

Meir  Kaftor  M/S  B100 
Honeywell  Information  Systems 
P.0.  Box  6000 
Phoenix,  AZ.  85005 
(602)  866-3381 

Nobuyuki  Goto 
Toshiba  Corporation 
I  Komukai-Toshiba-cho,  Saiwai-ku 
Kawasaki,  Japan  210 
(044)  511-2111 
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Tutorial  Program 


Gerhard  Herr sc her 
LITEF  ’ 

Loerracher  Strasse  18 
7800  Freiburg 
West  Germany 
0761-4901212 

A.  Speckhard 
Aerospace  Corperation 
2350  E.  El  Segundo  Blvd. 

El  Segundo  CA  90245 
(213)  648-7067 

John  Francis 

Sanders  Associates,  Inc. 

95  Canal  Street 
Nashua  NH  03060 

(603)  885-3746 

Paula  Bernstein 
Bell  Laboratories 
Warrenville-Naperville  Rds. 
Naperville  IL  60540 
(312)  462-2898 

R. F.  Hobson 

Simon  Fraser  University 

S. F.  University 

Computer  Science  Department 
Burnaby  British  Columbia  VSAIS6 

(604)  291-4277 

Dr.  Werner  Kluge 
GMD/ISF 
Postfach  1240 
SchloB  Birlinghoven 
West  Germany 

Malcolm  Muir 
Datamedix,  Inc. 

555  Hillsboro  Plaza 
Deerfield  Beach  FL  33441 
(305)  428-4526 

Ronald  L,  Engelbrecht 
NCR  Corp.  -  E&M-Wichita 
3718  N.  Rock  Road 
Wichita  KS  67218 
(316)  088-8646 

Dr.  F.J.  Burkowski 

Computer  Science  Department 

University  of  Manitoba 

Room  545  Machray  Hall 

Winnipeg  Manitoba,  Canada  R3T  2N2 

(204)  47408313 


Allen  Brown 

Hewlett-Packard 

HPL/CRL 

1501  Page  Mill  Rd. 

Palo  Alto  CA  94304 
857-8776 

Kelji  Kuwahara  \ 

Nikkei-McGraw-Hill 

2-1-2  Uchikanda,  Chiyoda-ku 

Tokyo  Japan 

(03)  256-1561 

Y.  El-zig 
Honeywell 
Honeywell  Plaza 
Minneapolis  Minnisota  55408 

David  E.  Heinen 
Tektronix,  Inc, 

P.0.  Box  500  DS  63-311 
Beaverton  OR  97  077 
(503)  682-3411  x3845 

Lawrence  Katz 
Tektronix,  Inc. 

P.O.,  Box  500  DS  63-311 
Beaverton  OR  97077 
(503)  682-3411  x3081 

R.  Curtis 
Canisius  College 
2011  Main  Street 
Buffalo  NY  14208 
(716)  831-7000 

John  Bowles 

NCR  Corporation 

3325  Platt  Springs  Rd. 

C.  Columbia  SC  29169 
(803)  796-9250  x524 

David  M.  Abrahamson 
Department  of  Computer  Science 
Trinity  College 
Dublin  2  Ireland 
772941  Ext.  1765 

Hugh  L.  Applewhite 
Honeywell  n-2352 
2600  Ridgway  N.E. 

Minneapolis,  MN  55413 
(612)  378-4510 
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Leon  1.  Maissel 
IBM  Corp 

Dept.  CIA,  Bldg  70A, 

P.O.Box  390 
Poughkeepsie,  NY  12602 
(914)  463-2301 

Raymond  L.  Phoenix 
IBM  Corp 

Dept.  C14,  Bldg  704, 

P.O.Box  390 
Poughkeepsie,  NY  12602 
(914)  463-5445 

Zvl  Weiss 

IBM  Research  Center 
Yorktown  Heights,  NY  10598 
(914)  962-7036 

Richard  Ramseyer 
Honeywell  SRC  Research 
2600  Rldgway  Pkwy,  MN17-2352 
Minneapolis,  MN  55413 
(  )  378-5023 

Tetsuo  Ida 

Institute  of  Physical  &  Chest.  Res 
2-1,  Hirosawa, 

Wako-shi,  Saltama  351 
Japan 

Greg  Bettice 
Naval  yalonics  Center 
8125  Harrison  Drive 
Lawrence,  IN  46226 
(317)  353-3226 

Reger  R.  Bate 

Texas  Instruments,  Inc. 

P.O.Box  222013,  M/S  3407 
Dallas,  TX  75222 
(214)  462-4790 

Ron  Rutledge 
DOT/TSC,  P.O.Box  53 
Kendall  Square 
Cambridge,  MA  02142 
(617)  494-2038 


Robert  F„  Cmelik 
Bell  Laboratories 
Room  7D-414 
600  Mountain  Ave. 

Murray  Hill  NJ  07974 
(201)  582-5797 

David  R.  Ditzel 
Bell  Laboratories 
2C-523 

Murray  Hill  NJ  07974. 

(201)  582-3655 

Thomas  A.  Almy 
Tektronix,  Inc.  M/S  50-384 
Box  500 

Beaverton  OR  97077 
644-0161  x6056 

Herman  Hartlg 

Universitat  Karlsruhe 

Institut  fur  Informatik  IV 

75  Karlsruhe  I 

Postfach  6380 

Zirkel  Nr.: 2 

W-Germany 

Mary  Miller 
Bell  Laboratories 
30W062  Capistrano  ‘Ct,  #302 
Naperville.  Ill.  6Q54Q 
(312)  4624269 

John  Peterson 
University  of  Colorado 
2845  S„  Gilpin 
Denver  CO  80210 
(303)  629-2872 

Bernard  Lecussan 
36  Impasse  St.  Felix 
31400  Toulouse,  France 

Jean-Paul  Sansonnet 
15  Rue  ctre  Midi  Bat.l 
31400  Toulouse,  France 


*  ilfciv  »♦  Hi*  H  ufo  ' 
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David  A.  Patterson 
University  of  California 
Electrical  Engineering  and 
and  Computer  Sciences 
Computer  Science  Division 
Berkeley,  CA  94720 


Lars-Erik  Thor ell i 

Royal  Institute  of  Technology 

S-10044  Stockholm  Sweden 


N.G.  Frank  Thoma 
IBM 

4686  NW  2nd  Ct. 

Boca  Raton  FL  33431 
(305)  368-4676 

Joseph  C.  Rhodes,  Jr* 

IBM  Corporation 
P.O.  Box  1328 
Boca  Raton  FL  33432 
(305)  994-7654 

Goran  Bage 
LM  Ericsson 

S-12625  Stockholm,  Sweden 

Peter  K lam  bat sen 
IBM  Corporation 
2000  NW  51st  Street 
P.O.  Box  1328 
Boca  Raton,  FL  33064 
994-5098 


Molses  Cases 
IBM-GSD 
Yamato  Road 
Boca  Raton,  FL 
994-7992 

Dick  Conn 

Fairchild  Camera  & 
Instrument  Corp. 
464  Ellis  Street 
Mt.  View, .CA  94040 
(415)  962-2337 

Jack  Quanstrom 
IBM  Corporation 
P.O.  Box  1328 
Dept.  24K/ 03 2-3 
Boca  Raton,  FL  33432 
994-4770 


Tich  T,  Dao  M/S  17-5904 
Fairchild  Camera  & 

Instrument  Corp. 

464  Ellis  Street 
Mt.  View,  CA  94040 
(415)  962-7532 

Mark  T.  Michael 

US  Air  Force,  Avionics  Laboratory 
WPAFB,  OH  45433 
(513)  255-4920 
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THE  ARCHITECTURE  OF  A  PARALLEL  EXFXUTIOH 
HIGH-LEVEL  LANGUAGE  COMPUTER  * 


Pong-shang  Wang  and  Hlng  T.  Liu 

Dspartnent  of  Computer  and  Information  Science 
The  Ohio  State  University 
Columbus,  Ohio  43210 


This  paper  presents  an  Internal  language  for 
a  high-level  language  comptur  to  facilitate 
parallel  execution  of  arlthastlc  expressions  and 
concurrent  atatsaaints  and  to  perform  try-ahead 
operations  for  IP,  WHILE,  and  REPEAT  statements* 
The  architecture  of  such  a  computer  la  also 
described,  which  consists  of  multiple  Independent 
processors  for  language  processing  and  parallel 
computation*  The  Increase  In  spaed  le  achlevod  by 
parallel  execution,  by  try-ahead  processing,  and 
by  the  pipeline  effect  created  by  the  independent 
processors  simultaneously  performing  verloun 
tasks.  An  algorithm  that  translates  an  arithmetic 
expression  lntc  the  Internal  language  fora  la  also 
Included  In  the  send lx* 


In  the  area  of  high-level  computer 
architecture,  various  machine  organisations  have 
been  proposed  with  features  to  Increase  the 
program  processing  speed  [1]  [2].  These  deulgns 
Include  Independent  processors  to  perform  various 
tasks  In  language  translation  and  execution,  such 
as  the  lexical  processor,  syntactic  processor, 
semantic  processor,  arithmetic  processor,  etc. 
These  processors  operate  simultaneously  and 
asynchronously,  and  create  a  pipeline  effect  In 
the  whole  system*  The  concurrency  among  these 
processors  results  In  the  speed  Increase  In 
language  translation  and  execution. 

In  this  paper,  however,  we  look  Into  another 
possibility  of  gaining  opeed  In  high-level 
language  computers,  namely,  the  parallel  execution 
of  arithmetic  expressions  end  concurrent 
statements,  and  the  try-ahead  processing  of 
statements  Involving  conditions,  such  as  IF, 
WHILE,  and  REPEAT.  The  scheme  that  we  use  here 
calls  for  an  Indirect-execution  architecture  which 
uses  an  Internal  language  and  Is  of  type  3 
according  to  Chu's  classification  [3].  Source 
programs  are  translated  Into  the  Internal 
representation,  which  Is  then  Interpreted  by  the 


Research  reported  herein  was  supported  In  part 
by  NSF-MCS- 77-23496. 


machine  hardware.  We  will  describe  first  the 
features  In  the  Internal  language  that  make 
parallel  and  try-ahaad  operations  possible,  and 
then  the  co^iuter  organisation  for  carrying  out 
thaae  operations.  We  will  discuss  only  the 
features  In  the  lnternsl  language  that  are 
relevant  to  parallel  execution  end  try-ahead 
processing,  and  Ignore  others  such  as  Identifiers, 
labels,  etc.,  since  they  are  Immaterial  to  the 
purpose  of  this  paper  and  they  can  be  found  in 
other  papers,  e.g.  [1]  [4]*  The  syntax  and 
semantics  of  the  high-level  language  constructs 
are  the  same  as  those  in  PASCAL* 

In  Section  2.1,  we  flrsv  briefly  describe  the 
notion  of  Parallel  Execution  Strings  (PES)  for 
executing  arithmetic  expressions  In  perallel,  and 
then  propose  a  linear  representation  scheme  as  the 
Internal  language  for  a  high-level  computer.  An 
algorithm  which  translates  an  arithmetic 
expression  into  the  Internal  language  form  la 
Included  In  the  Appendix.  In  Section  2.2,  we 
present  e  method  of  representing  e  concurrent 
•tetement  In  the  internal  language  so  that  It  can 
be  executed  concurrently.  In  Sections  2.3  through 
2.5,  we  describe  the  representation  of  IF 
ststenents,  WHILE  statements,  and  REPEAT 
statements  in  the  Internal  language  for  try-ahead 
processing.  The  representation  allows  the 
possible  paths  in  a  statement  Involving  a 
condition  to  be  executed  even  before  the 
evaluation  of  the  condition  Is  completed. 
Finally,  e  high-level  computer  organisation  is 
presented  In  Section  III,  which  includes 
Independent  processors  for  languaga  processing, 
end  multiple  Semantic  Proceseore  and  PES  Access 
Processors  for  parallel  computations.  In  the 
computer  organisation,  each  axecutlon  straam  Is 
ecceeeed  and  executad  by  a  PBS  Accaws  Processor 
and  a  Samantlc  Procasaor.  Each  Semantic  Procaaeor 
haa  its  own  Arithmetic  Processor  and  Local  Storage 
for  concurrent  processing  and  try-ahead 
processing. 


II.  internal  Language  Constructs 


2 • 1  Arithmetic  Expressions  for  Perallel  Execution 

A  scheme  for  decomposing  arithmetic 
expressions  for  parallel  execution,  called  the 
Perallel  Execution  String  (PES),  has  bean  proposed 
In  [5]  [6],  It  can  be  auamarixad  as  follows. 


1 


Definition 

In  an  sxpreaslon  tree,  an  oparator  node  la 
called 

type  I  —  If  all  of  lta  operanda  are 
varlablaa  or  constants; 

type  2  —  if  exactly  ona  of  lta  operanda  la 
an  oparator;  and 

type  3  —  If  it  la  a  binary  oparator  and 
both  of  lta  oparanda  ara 
opera tor a. 

Conalder  an  exp  rea  aloe  In  lta  traa 

repreaentatlon.  Thoaa  oparator  nodaa,  tha 

oparanda  of  which  ara  varlablaa  or  conatanta 
(l.a.,  typa  1),  will  bo  tha  atartlnt  polnta  of  tha 
parallal  execution  strings.  Beginning  at  tha 
atartlng  polnta,  thaaa  a  triage  ara  executed  la  tha 
direction  toward  tha  root  node,  each  of  which  can 
ba  alaultanaoualy  executed  by  an  Independent 
procaaaor.  Bach  proceeaor  axacutaa  tha  typa  1 
oparatora  In  a  string  ona  by  ona  at  lta  aaxlaaia 
speed  without  waiting*  At  an  oparator  node  where 
two  atringa  aaat  (l.a.,  type  3),  the  procaaaor 
which  raachaa  thia  noda  first  will  dapoalt  tha 
partial  raault  It  obtalna  thua  far  Into  a 
taaporary  atorage  and  than  atop,  wharaaa  tha  other 

procaaaor  which  raachaa  this  noda  later  will 

axacuta  tha  operation  at  tha  aerglng  noda  and 
continue  to  execute  tha  remaining  string*  For 
axanple,  tha  expraaaloa  traa  In  Figure  1  has  three 
type  1  nodaa:  A+B,  CM),  and  C-H;  and  hence  there 
ara  three  parallal  execution  strings*  Tha  two 
type  3  nodaa  In  Figure  1  ara  labalsd  as  #1  and  #2, 
respectively.  Nota  that  tha  nuaber  of  type  3 
nodaa  la  always  one  lean  than  the  nuaber  of  typa  1 
nodes. 


The  expression  J-(A+B)*(CM>+E-F/ (0-H) ) 
can  be  raprasanted  aa  a  tree: 


To  iaplanant  thla  concept  in  a  high-level 
language  coaputar,  wa  have  to  devlae  a  linear 
repreaentatlon  for  tha  parallel  execution  atrlnge 
In  an  axpraaalon  tree  and  uaa  It  aa  the  internal 
language  for  the  high-level  language  coaputar. 
With  thla  internal  language,  tha  entry  polnta  of 
tha  atringa  ara  chained  aa  a  linked  Hat  by 
pointers  called  Parallal  Pointers .  For  the 
oparator  where  two  strings  meet,  one  of  Its  two 
oparanda  la  tha  raault  of  tha  pravloua  oparatlon 
In  tha  procaaoor  and  hanca  naad  not  ha  specified, 
and  tha  other  operand  la  raprasanted  by  #1,  where 
1  la  a  unique  nuaber  identifying  a  taaporary 
storagt  fot  tha  partial  raault  obtalnad  by  the 
procaaaor  executing  tbs  other  atring.  Tha  first 
of  tha  two  uarglng  atringa  has  a  June  Pointer 
following  tha  merging  point  oparator  and  pointing 
to  tha  location  that  1  Mediately  follows  the 
aerglng  point  oparator  In  tha  second  atring. 

To  eliminate  tha  naad  of  a  stack  during  the 
execution  of  arithmetic  express lone,  tha  ordering 
of  oparanda  will  ba  reversed  In  tha  following 
situation:  whan  tha  raault  of  tha  previous 
oparator  la  tha  second  operand  of  tha  currant 
oparator,  tha  flrat  operand  will  appear  as  the 
eecond  operand  In  thla  raprasentatlon.  Thus,  If 
tha  oparator  la  non-coamutatlva,  It  will  ba  marked 
with  an  apoatrapha  following  tha  oparator  to 
indicate  that  tha  ordering  of  lta  oparanda  la 
ravaraad. 

Figure  1  la  an  example  of  representing  an 
arithmetic  axpraaalon  la  tha  Internal  language. 
In  Figure  1,  Pete  represents  a  Parallal  Pointer, 
and  Jug£  a  Jump  Pointer.  When  a  PUS  Access 
Procaaaor  axacutaa  a  Parallal  Pointer  (eat  Section 
III  and  Figure  3,)  It  will  put  tha  pointer  value 
Into  ono  of  the  Entry  Point  kaglatsrs  ao  that  the 
next  string  can  ba  chosen  for  axacutlon  aa  soon  as 
another  PBS  Access  Processor  becomes  free. 


2.2  Concurrent  Statement a 
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It  can  be  translated  Into  the  internal  language  ns: 


Pats  A  B  +  #  l  *  Juap^ 

jgrg  C  D  *  E  +  #2  -  Jump 
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Flg.l  Example  of  Translating  an  Expression 
Into  the  Internal  Language 


A  concurrent  sea tenant  [7]  is  a  sat  of 
statements  enclosed  by  a  header  C0B3GIN  and  a 
trailer  COEND;  for  example, 


COBEGIIi 


Statement  1; 
statement  2; 


Statement  n 


COEND 


The  statements  in  a  concurrent  statement  can 
be  executed  simultaneously.  A  flowchart  of  the 
above  concurrent  statement  la  ahown  In  Figure  2. 
To  execute  the  concurrent  statement,  It  wlLl  be 
translated  into  tha  Internal  language  aa  follows; 


COBEGIN  Para  Statement 


Para  Statement  2 
?ar.i  ....;  Statement  n  COEND 


if...- 


Figure  2  Flowchart  of  a  Concurrent  Statement 


processor  executes  the  symbol  "THENEND",  or  when 
the  ELSE  state  processor  executes  the  symbol 
"ELSEND" ,  the  processor  will  halt  Its  execution  In 
the  WAIT  state.  However,  "THEN BHD"  and  "ELSEND" 
will  have  no  effect  on  a  processor  which  Is  In  the 
normal  mode  of  operation. 

When  the  first  processor  executes  the  symbol 
"IF" ,  It  Interrupts  both  the  second  and  the  third 
processors.  Depending  upon  the  result  of  the 
conditional  expression,  It  makes  one  of  the  two 
processors  free  immediately  and  discards  any 
computation  the  processor  has  done;  any 
environment  changes  made  by  the  other  processor 
are  copied  into  the  main  storage  and  the  processor 
becomes  free.  When  that  Is  dona,  the  first 
processor  resumes  execution  from  where  the  latter 
processor  was  Interrupted. 


The  processor  that  executes  Statement  n  will 
execute  COEND.  The  effect  of  executing  COEND  le 
that  the  proceseor  will  halt  Its  execution 
temporarily  until  all  the  other  processors  bacons 
free. 


The  aamlcolone  In  e  concurrent  statement  will 
be  preserved  In  the  internal  language.  A 
semicolon  Indicates  the  end  of  e  simple  statement 
in  e  concurrent  (tatement  and  hanca  makes  the 
proceeior  which  le  executing  the  simple  statement 
free. 


2.3  IF  Stetenents 


IF  statements  will  be  processed  with  a 
try -ahead  method.  The  following  IF  statement 

IF  condition  THEN  statement  1  ELSE  statement  2; 

will  be  translated  Into  the  Internal  language  as 
follow* i 


Pere^condltlon  IP  .Pare  THEN  etatement  1  THENEND 


£lse 


=£3 


-m»ov  ELSE  statement  2  ELSEND^ 


The  proceseor  wtich  start*  executing  the  IF 
statement  will  sat  up  the  entry  to  the  THIN  clauaa 
for  a  sacond  processor,  which  In  turn  sets  up  the 
entry  to  the  EL8E  clause  for  a  third  processor. 
While  the  first  procasaor  la  evaluating  tha 
conditional  expraaslon,  both  tha  statsment  1  and 
the  etatement  2  are  being  executed  simultaneously. 
Howavar,  any  anvlronmant  changes  resulting  from 
the  execution  of  the  etatement  1  and  tha  statement 
2  art  kept  in  tha  local  storage  of  the  second  and 
the  third  proceseor*,  rsspsctlvsly,  and  will  have 
no  affact  althar  on  tha  execution  of  the  other  or 
on  tha  evaluation  of  the  conditional  expression. 

Executing  tha  symbols  "THEN"  and  "ELSE” 
causaa  tha  processor  to  enter  the  THEN  state  and 
the  ELSE  state,  respectively.  Whan  the  THEN  state. 


2.4  WHILE  Statements 


WHILE  statements  end  REPEAT  statements  will 
be  processed  with  the  try-ahead  method  similar  to 
that  for  IF  statements.  However,  only  tha 
repetitive  path  will  be  tried  In  advance. 

Tha  WHILE  etatement 


WHILE  condition  DO  statement  1; 
will  be  translated  as l 


^ara^ condition  WHILE  jIHILEDO  e tatement  1  WHILEND  J. 


*9 


The  proceseor  which  axacutae  tha  conditional 
expression  seta  up  tha  entry  to  tha  WH1LBDO  path 
for  a  second  processor.  The  conditional 
expression  and  tha  tha  WHILEDO  path  are  than 
executed  simultaneously. 

Executing  the  symbol  "WHILIDO"  forcss  the 
second  processor  to  enter  the  WHILEDO  stats.  Tha 
environment  changes  mads  by  a  WHILEDO  stats 
processor  do  not  affect  the  main  storage  and  ara 
only  kept  in  tha  local  storage  of  the  processor. 
A  WHILEDO  stats  processor  will  halt  its  execution 
In  the  WAIT  stats  when  It  executes  tha  symbol 
"WHILEND."  However,  tha  "WHILEND"  will  have  no 
•ffset  on  a  proceseor  which  la  In  tha  normal  moda 
of  oparatlon. 

Whan  tha  flrat  procasaor  axscutaa  tha  symbol 
"WHILE,"  It  Interrupts  tha  aacond  procasaor.  If 
the  result  of  ths  conditional  axprasolon  is  FALSE, 
the  second  procasaor  bacomaa  free  immediately  and 
everything  in  lta  local  storage  will  not  be  used. 
The  first  procaasor  than  follows  tbs  WHILE  pointer 
to  axacuts  ths  next  statement. 

If  ths  result  la  TRUE,  ths  anvlronmant 
changes  stored  In  ths  local  storage  of  ths  second 
processor  will  be  copied  Into  ths  main  storagsj 
and  tha  second  processor  bacomaa  free.  The  first 
processor  than  resumes  execution  from  where  ths 
second  processor  was  Interrupted. 
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2.S  REPEAT  Statement* 


The  REPEAT  statement 


REPEAT 

statement  1; 
statement  2; 


Haynes  [1]  except  that  multiple  Identical 
processors  ere  also  used  for  parallel 
computations.  Since  we  are  interested  only  in  the 
parallel  execution  aspects  of  the  architecture, 
other  features  which  ere  the  aeae  as  Keynes  [Lj 
will  not  be  duplicated  here. 


PES  Access  Processors 


ateteaant  n 

UNTIL  conditions 


will  be  translated  eat 
statement  1  statement 2 


ateteaant  n 


V 


Pare  condition  UNTIL 


REPEAT 


RKPEATEND 

> 


The  proceaeor  which  executes  tho  conditional 
expression  eats  up  the  entry  to  the  REPEAT  path 
for  a  second  processor.  The  conditional 
expression  and  the  REPEAT  path  are  then  executed 
slaultaneously.  ExecutloR  the  syabol  "REPEAT" 
forces  the  second  processor  to  enter  the  REPEAT 
state.  The  environment  changes  made  by  a  REPEAT 
state  proceaaor  do  not  affect  the  aaln  storage  and 
are  only  kept  In  the  local  storage  of  the 
processor. 

The  syabol  "REFEATEND"  Is  added  to  the 
statement.  Execution  of  "REPRATEND"  will  have  no 
affect  on  a  proceaaor  which  is  In  the  normal  node 
of  operation.  Once  a  REPEAT  state  proceaaor 
executes  the  "EEPEAIENO",  It  will  halt  lta 
execution  In  the  WAIT  state. 

Wien  the  first  processor  executes  the 
"UNTIL,"  it  Interrupts  the  second  processor.  It 
the  result  of  the  conditional  expraaalon  la  TRUE, 
the  second  processor  becoaes  free  Immediately. 
The  first  processor  then  follows  the  UNTIL  pointer 
to  execute  the  next  statement. 


The  PBS  Memory  atoras  the  internal 
representation  of  the  source  programs.  During  the 
translation  phase,  the  PES  Acceaa  Processor 
racalvaa  program  tokens  In  the  Internal  fora  from 
tha  associated  Syntactic  and  Semantic  Processor, 
assembles  and  stores  them  Into  the  PRS  Memory. 
During  tha  execution  phase,  each  PES  Access 
Proceaaor  reads  the  program  from  tha  PES  Memory, 
saparataa  and  delivers  tha  symbols  to  the 
associated  Syntactic  and  Semantic  Processor  that 
it  la  attached  to.  A  free  PES  Acceaa  Proceaaor 
will  start  executing  a  (parallel)  execution  string 
by  using  a  non-empty  value  from  one  of  the  Entry 
Point  Registers  as  a  starting  addrasa  In  the  PES 
Memory  for  execution.  After  that  the  Entry  Point 
Register  is  cleared. 

The  PES  Access  Processor  can  continue  reading 
from  tha  PES  Memory  until  either  Its  buffers  ara 
full  or  it  has  read  a  semicolon,  which  Indicates 
tha  and  of  a  simple  statement  In  a  concurrent 

statement. 

Parallel  Pointers  end  Jump  Polntera  are 
executed  by  PES  Acceaa  Proceaaora.  When  a  PES 
Accaas  Processor  read*  a  Parallel  Pointer,  It  put* 
rha  pointer  value  and  lta  processor  Identification 
into  one  of  tha  Entry  Point  Registers  and 
continues  lta  processing.  Whan  a  PES  Acceaa 
Processor  reads  a  Jump  Pointer,  It  simply  altars 
lta  program  counter  and  reads  tha  program  from  tha 
naw  location. 


Syntactic  and  Stmantlc  Processors 


If  tha  result  la  FALSE,  tha  environment 
changaa  stored  In  the  local  storage  of  tha  second 
procaaaor  will  ba  copied  Into  the  main  etorage, 
and  the  second  processor  becomes  free.  The  first 
proceaeor  then  resumes  Its  execution  from  where 
tha  second  proceaeor  wee  Interrupted. 


in*  ALsiiUisiaia 


The  architecture  of  a  high-level  language 
computer  which  can  execute  the  Internal  language 
as  described  In  Section  II  la  shown  in  Figure  3. 
It  cooalata  of  PRS  Memory,  Main  Memory,  Partial 
Result  B  to  tag • ,  a  8c*nn*r,  an  I/O  Procaaaor,  and  a 
number  of  PI8  Acceaa  Procaaaor*,  Entry  Point 
Ragle tars,  Syntactic  end  Semantic  Proceaaora, 
Local  8torage,  and  Arithmetic  Proceaaora.  Tha 
various  kind*  of  processors  ara  operating 
simultaneously  In  a  pipelined  manner,  and  tha 
organisation  la  similar  to  tha  on*  propoaad  by 


Each  PBS  Accaas  Processor  la  attached  to  a 
Syntactic  and  Semantic  Procaaaor.  Tha  PES  Accaas 
Processor  and  lta  associated  Syntactic  and 
Semantic  Procaaaor  are  operating  concurrently  and 
asynchronously.  Tha  communication  between  them  la 
carried  out  by  tha  buffer*  In  tha  PES  Access 
Procaaaor  and  a  counting  semaphore.  During 
translation,  the  Syntactic  and  8*mantlc  Procaaaor 
vacelva*  program  tokens  from  the  Scanner,  performs 
syntax  analysis,  translate*  tha  program  Into  the 
Internal  language,  and  delivers  tha  resulting 
program  to  the  PBS  Access  Processor.  During  tha 
axacutlon  phase,  tha  Syntactic  and  Semantic 
Procaaaor  executes  various  type*  of  operators  aant 
by  Its  PIS  Access  Processor,  such  as  IF,  THEN, 
ELSE,  BEGIN,  WHILE,  REPEAT,  ate.  It  also  sands 
command!  to  its  Arithmetic  Procaaaor  and  tha  I/O 
Processor.  A  Syntactic  and  Semantic  Processor  can 
clso  altar  the  program  counter  of  it*  PIS  Accaas 
Procaaaor  whan  It  axacutae  a  "GOTO",  "WHILE",  or 
"UNTIL."  Each  Syntactic  and  S«antic  Proceaaor  has 
It*  own  local  memory  to  temporarily  store  tha 
environment  changes  during  try-ahaad  processing. 
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Ths  Syntactic  and  Semantic  Procaaaora  arc 
lntarcooaactad  to  aach  othar  ao  that  whsn  a 
try-ahaad  path  la  takan  by  a  Syntactic  and 
Semantic  Frocaaaor,  it  can  aand  lta  procaaaor 
Identification  to  tha  procaaaor  which  la  executing 
tha  conditional  axpraaalon*  Altar  tha  conditional 
expression  la  avaluatad,  tha  lattar  procaaaor  will 
Interrupt  tha  former  and  taka  tha  appropriate 
actions  aa  daacrlbad  in  Sactlon  II. 


An  Ari than tic  Procaaaor  la  connected  to  aach 
of  tha  Syntactic  and  Sanantlc  Procaaaora.  Whan  a 
Syntactic  and  Sanantlc  Procaaaor  racalvaa  an 
operand  Iron  lta  PSS  Accaaa  Procaaaor,  It  aavaa 
tha  type  and  value  of  tha  operand  Into  lta  operand 
raglatara.  When  It  racalvaa  an  arithmetic 
operator,  it  dlrecta  lta  Arlthaetic  Procaaaor  to 
par fora  tha  operation  on  tha  operands  atorad  In 
Its  operand  raglatara.  The  Arithmetic  Procaaaor 
will  check  tha  types  of  tha  operanda,  and  perform 
all  type  convaralona  If  naedad.  Tha  raaulta  of  an 
arithmetic  operation  are  stored  Into  tha  operand 
raglatara  of  tha  Syntactic  and  Semantic  Processor 
which  has  sent,  the  operator.  Our  achsma  used  hare 
will  not  require  any  stack  for  arithmetic 
axpraaslon  executions,  and,  at  any  time,  no  more 
than  two  oparanda  will  ba  In  tha  operand  raglatara 
of  a  Syntactic  and  Semantic  Processor.  A  stack  la 
used  In  tha  main  storage  only  to  allocate  specs 
whan  a  block  or  procedure  Is  entered. 


Tha  Partial  Result  Starags  la  to  temporarily 
store  tha  partial  results  obtained  during  the 
execution  of  an  axpraaslon.  Each  location  In  the 
Partial  Result  Storage  haa  a  tag  associated  with 
It  to  Indicate  whether  It  Is  empty  or  full.  All 
tags  are  cleared  initially  to  Indicate  "empty." 
Whan  a  Syntactic  and  Semantic  Processor  racalvaa  a 
partial  result  operand,  l.a.,  an  operand  of  the 
form  #1,  from  lta  PBS  Access  Processor,  it  will 
check  tha  tag  of  location  1  in  tha  Partial  Result 
Storage.  If  It  Indicates  "empty",  tha  Syntactic 
and  Semantic  Processor  will  save  tha  contents  of 
Its  operand  registers  into  location  1  of  the 
Partial  laault  Storaga  and  set  tha  tag  to  Indicate 
"full".  Tha  Syntactic  and  Semantic  Processor  than 
becomes  free.  If  tha  tag  Indicates  "full,"  tha 
Syntactic  and  Semantic  Processor  will  reset  tha 
tag  to  indicate  "empty",  raad  the  contents  of 
location  1  Into  Its  operand  raglatara,  and  use 
than  aa  tha  operand  for  the  naxt  operation. 


statements.  For  an  IF  statement,  both  tha  THtt  i 
path  and  tha  ELSE  path  are  triad  simultaneously, 
while  tha  eonditloual  expression  la  being 

executed.  The  wrong  path  la  later  discarded,  aad  ' 
the  right  path  activated.  for  WHILE  s tat  Aments  ' 
and  REPEAT  statements,  only  the  repetitive  path  Is  ' 
tried  ahead,  slnca  It  is  the  one  more  likely  to  be 
correct.  The  resulting  system  can  Increase  lta 
processing  speed  over  other  designs  thtough 
distributed  processing  of  various  tasks  by 

multiple  Independent  processors,  through  parallel 
execution  of  arithmetic  expressions  and  concurrent 
statements,  and  through  try-ahead  proceaalng  of 
tha  statements  Involving  conditions. 
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In  this  paper  we  have  presented  an  internal 
language  for  a  high-level  language  computer,  in 
which  arlthaetic  expressions  and  concurrent 

statements  era  expressed  as  parallel  executable 

strings.  Try-ahaad  operations  are  performed  for 
IF  statements,  WHILE  statements,  end  REPEAT 


Appendix 


A  TranglstlPn_Al*orlthm 


Tha  algorithm  la  to  translate  an  arithmetic 
txprasalon  Into  tha  Intarnal  languaga  form 
daaorlbad  In  Sactlon  2.1.  During  tha  tranalatlon 
procaaa.  two  atacka  will  ba  uaadt  OPR-STK  and 
OPN-STK,  for  atorlng  oparatora  and  oparanda, 
respectively.  Dollar  algna  ($)  will  alao  bt  uaad 
In  OPN-STK.  LC  la  tha  Location  Countar  which 
contalna  tha  addraaa  of  tha  location  for  atorlng 
tha  naxt  output.  Two  varlablaa  ara  uaadt 
TEMP-COUNTER  la  for  tha  numbar  of  taaporary 
storage  locations  uaad.  and  PES-BEC1N  la  tha 
starting  addresa  of  the  atrlng  currently  being 
genaratad.  An  array  TEMP-POINTER  (TP)  la  uaad  In 
tha  algorithm.  TP(1)  stores  tha  addraaa  of  the 
first  of  tha  two  ll'a  In  tha  output,  so  that  when 
tha  second  #1  la  ganaratad,  a  Jump  Pointer  tu  the 
second  fl  can  ba  ganaratad  at  the  location 
following  tha  first  #1. 

Tha  algorithm  la  similar  to  that  ot 
translating  an  expression  into  a  ravarsa  Polish 
string,  axcapt  that  oparanda  ara  not  written  out 
immediately  and  Its  operator  output  procadura  la 
mora  complicated.  A  hardware  translator  can  ba 
easily  Implemented  In  tha  Syntactic  and  Semantic 
Processor  (8].  Figure  4  la  tha  flowchart  of  the 
algorithm. 


Main  Procedure 

1.  Clear  TP  array.  Initialise  TKMP-COUHTKR  4- 
0}  PES-BEGIN  d-  LC. 

2.  S  a-  next  input  symbol. 

3.  If  S  Is  a  than  push  OPR-STK('(')  und  go 

to  2, 

also  If  S  Is  a  variable,  than  push 
OPN-STK(S) , 

alsa  ERROR. 

4.  S  a-  naxt  Input  symbol. 

3.  WHILE  Priority (OPR-TOP)  >  Prlority(S)  DO 
POP-0PR-STK. 

6.  If  S  is  a  ')'  and  OPRTOP-'<*.  then  pop  OPR-STK 
and  go  to  4. 

If  S  Is  a  ')'  and  OPRTOP  Is  not  '(',  ERROR. 

7.  If  S  la  an  arithmetic  operator,  than  push 
OPR-STK(S)  and  go  to  2. 

8.  If  S  Is  'end-of-exprsssion'  and  OPR-TOP  is  not 

a 

than  DONE  alaa  ERROR. 


Procedure  POP-OPR-STK 

Case  1  The  OPR  being  popped  la  a  unary 
operator! 

Cass  1.1  OPN-STK(TOP)  la  a  variable! 

1.  FINISH-PREVIOUS-PKS • 

2.  Pop  OPN-STK,  mid  nut  put  It. 

3.  Output  OPR. 


4.  Push  $ (TEMP-COUNTER  +  1)  onto 
OPN-STK. 

Cssu  1.2  OPN-STK(TOP)  is  a  $k! 

1 .  Output  OPR. 

Case  2  The  OPK  being  popped  la  a  binary 

operator.  Depending  upon  tha  top 
two  elements  on  OPN-STK,  thara  arc 
three  caaaa: 

Caae  2.1  Both  of  tha  two  alamanta  are 
variables! 

1.  PIN 1SK-PREV 10US-PES . 

2.  Output  tha  top  two  alamanta  from 
OPN-STK. 

3.  Replace  the  top  two  elements  on 
OPN-STK  by  $ (TEMP -COUNTER  +1). 

4.  Output  OPK. 

Case  2.2  One  element  Is  a  variable,  .mil 
the  other  is  a  $li 

1.  Output  tha  variable. 

2.  If  OPR  is  non-cmamutatlve  and 
Ol'N-STK(TUP)  la  $1,  then  output 
OPR',  alee  output  OPR. 

3.  Replace  the  top  two  elements  on 
OPN-STK  by  SI. 

Caae  2.3  both  of  the  two  alamanta  ara 

$'s.  Let  OPN-STK(TOP-l)  ba  $k,  and 
let  OPN-STK (TOP)  ba  $jt 

1 .  Output  Fk. 

2.  If  OPR  la  non-commutatlva,  than 
output  OPR',  alsa  output  OPR. 

3.  Output  OPR  tu  the  location  pointed 
to  by  TP(K) . 

4.  Output  a  .lump  Pointer  with  the 
content  of  I.C  to  thu  locution 
pointed  to  by  TP(K)+l. 

3.  Rap  I  arc  the  top  two  elements  on 
OPN-STK  by  $J. 


Procadura  FINISH-PKEVIOUS-PES 

1.  TEMP-COUNTER  4-  TEMP-COUNTER  +  1. 

2.  Output  l(TEMP-COUNTER). 

3.  TL(TEMP-CUUNTER)  a-  LC;  lncramant  LC  by  2. 

4.  Output  a  Parallal  Pointer  with  the  content  ot 
LC  to  the  location  addraaaed  by  PES-BECIN. 

3.  PES-BECIN  4-  LC. 
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Figure  Aa  Main  Procedure  of  the  Trnnalation  Algorithm 
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Abstract 

This  paper  presents  a  coariirtual 
■lesign  of  the  direot-execut ion  fortran 
Ccmputer.  first  somu  «c«ilflsntioti3  are 
introduced  into  tha  languut.e  fortran 
to  insure  eimpler  execution  and  cotter 
performance.  Next  follows  n  brief  fia- 
ciiteion  of  tbs  arohitsotura  of  this 
computer  and  than,  in  wore  uotmi,  of 
tnu  direct-execution  procnae  of  uoimu 
typical  fortran  statements  which  ;:my 
furnish  snoutlina  ol  thu  worn  of  this 
computor,  finally,  somu  cosimciitj  ora 
cade  on  the  poaaiblu  development  of 
tha  iiract-akecutlon  high-lover  language 
computer. 


it  intfoductl_o_n_ 

Kith  tbs  rapid  odrnncs  of  the  acianca  nod 
technology  of  computers  and  alactronica,  tha 
coat  of  thu  hnrdwara  becomaa  chopper  and  that 
of  the  software  beeomas  more  expensive  lay  by 
day,  Thia  makes  It  both  ooaaibie  and  necessary, 
to  design  tha  diraot-axaouticn  bigh-luvel  1  nn- 
gdhga  compdtar.  dlnee  tha  lsncunge  f'ortrnr.  is 
the  coat  widely  used  hlgh-lovel  longue po,  the 
.'.doenrch  and  design  of  tha  iiroct-exuedtion 
fortran  ooaputar  may  not  be  ill-nivisod. 

Phis  papar  gives  n  enneoptinl  Jvji,;n  oi  the 
J  irecr.-sxaoution  fortran  computer.  It  usaa  the 
AUdl  basic  Fortran  aa  tha  fanuonuntnl  language, 
out  In  order  to  insure  si  . pier  axucutioa  and 
setter  performance,  soma  ncJificot ions  at-' 
in troducad  into  thia  language  aa  follows, 

(1)  .Main  prograa  procsailed  oy  tha  auyword 
"Meetar"  should  ba  put  at  tha  ana.  for 
intsractioa  batwaen  man  and  machine  input¬ 
ting  la  carried  out  tenon  uy  toaeni  the 
iNnin  prograa  should  bo  put  nt  thu  front  of 
the  wholo  program.  Uut  ot  that  time  tha 
j heels  of  aach  progrnn,  unit  should  be  li¬ 


mited  to  a  certain  number. 

(2)  Tha  tyjwa  of  all  tha  variables  and  arrays 
should  ba  declared  explicitly,  especially 
the  aummy  arguments  of  the  statement 

function. 

(t)  In  order  to  distinguah  between  tha  uon- 
UM-cutnblu  nnd  oxecuinole  statements,  a 
Key.voiM  "Mac go"  is  p.'ocad  nt  tha  beginning 
oi  unch  statement  functioa.  The  dummy  argu¬ 
ments  of  statement  function  ore  localized 
in  tha  prograa  unit  of  this  ateteoent 
1  unction. 

(d)  fha  u^uiVAu.bHCA  statement  is  daluted. 

Tha  nrchitactura  of  tha  direct-execution 
fortran  computer  la  dosoribed  brlolly  in  aaotlon 
11  of  thia  papar.  The  dlraot-exacutlon  proce¬ 
dures  of  soma  typical  statements  of  the  language 
yortrnn  are  discussed  la  seotion  III.  He  believe 
that  may  furnish  an  outline  of  tha  aork  of  this 
direct-execution  foreran  computer,  finally  soma 
comments  are  maue  on  the  possible  development 
of  tha  direct-execution  high-level  language 
computer  in  auction  IV* 

II.  Architecture 

The  direct-execution  high-level  language 
ccmputar  uhould  execute  the  program  written  In 
thia  language  direotly  according  to  its  lexicon, 
syntax  and  aemnntioa  without  using  the  tradl- 
tionnl  and  complicate  multilayer  software,  (euch 
aa  compilers,  assemblers,  loaders,  etc).  Thus, 
itn  architecture  should  refloat  the  structuree 
of  lexicon,  control  and  Data  of  this  high-level 
language,  so  that  the  program  written  in  this 
language  may  be  treated  more  efficiently. 

The  computer  architecture  diagram  proposed 
ie  shoen  in  fig.  1.  It  consists  of  a  Program 
Memory  fW  (to  store  the  user's  program),  a  Data 
Memory  PM  (to  store  the  relevant  data)  and  four 
processors  (Input/output  prooassor  l/o  P, 

Lexical  rroeessor  LP,  Control  lToaessor  CP  and 
data  processor  PI’).  Among  the  processors  thsro 
are  also  the  control  bus,  tho  address  bus,  the 
data  uus  and  some  registers  to  store  information 


Q 


iL™ 


temporarily.  Then*  processors  aay  ba  microp¬ 
rocessors  or  built  up  with  LSI  ship*.  Tbsy  may 
operate f»p*llelly  and  synchronously  with  eaoh 
other  la  order  to  laorweaw  the  processing  epwed. 

The  uwwr'w  program  ujr  be  Input  Into  the  HI 
either  ell  at  onoe,  or  token  bp  token,  execu¬ 
ting  end  storing  simultaneously  to  allow  lnts- 
raotlon  between  aon  and  maohlna.  After  treatment 
by  1/0  P  the  ueer'e  program  le  Input  into  the 
HI  In  a  deflate  fora;  namely  with  a  terminal 
oheracter  at  the  end  of  eaoh  statement  and  two 
talking  eharaotara  one  at  the  beginning  of  eaoh 
progs**  unit  ana  the  other  at  the  end  of  the 
*h°lo  program,  The**  tagging  eheraoters  are 
called  unit  heada  and  program  and  eharaotara 
jreapect Italy,  they  are  In  the  flret  paeltioa  of 
;  tbe  label  region  and  are  different  from  any 
Ordinary  eharaotara  ueed  by  Fortran,  The  oodes 
stored  in  the  HI  nap  be  either  ASCII  or  com¬ 
pressed  Internal  oodaa, 

IF  le  uaad  for  lexloal  analyela.  It  lnoludea 
the  3AM  (Scanner  Aaaoolatiwe  Item  ry  whloh  etorea 
legal  eharaotara,  ato^,  There  are  two  working 
.modwa  for  LP  oontrollwd  by  OF:  aeannlng  and 
exwoutlng.  In  the  eoannlng  made,  LF  ohaeka  the 
character*  sent  from  HI  whether  there  la  a 
terminal  eharaoter  or  not,  so  aa  to  find  out 
the  label  region  (alnee  the  label  region  la 
Juat  nest  to  the  terminal  ohntaoter.)  After  LF 
finds  out  the  label,  the  unit  head  tag  and  the 
Charaatar  "D"  at  the  flret  poaltlon  of  the 
statement,  LP  la  transfers*  ta  the  asaoutlog 
JMioda.  in  the  axaeuting  node,  It  ohaeka  the 
riagallty  of  ohnraotara  aant  from  HI,  Spalla 
tffbea  Into  tokens  and  aaada  them  to  cp  and/or 

If  the  tokana  are  n  string  of  number* ,  set 
^OMremlatar  to  •!«,  make  a  am*  eohvarniao  and 
put  tha  converted  oodaa  Into  tho  VALUsi  register 
and  than  aond  than  out,  %• 

CP  oonalata  of  tho  CAM!/  (Unit  need  Control 
Associative  Memory),  the  CAUL  (Label  control 
Ahaoclative  Memory),  the  CAMS  (Reserved  Word 
Control  Associative  Memory ) ,  tha  B  Stack  (Baturn 
Staok),  tha  no  Steak,  tt*  CALL  Stack  mod  tho 
MSLRK>  (Mode  of  DP  and  LF  Beglatar),  CP  la  tha 
ooatrol  canter  af  thla  computer,  gfcea  tha  main 
program  la  axes ut ad  or  the  aubprogra*  in  oalleo 
it  aata  Dp  Into  tha  operating  mode  bp  as  nan  of 
the  register  MELRJO.  Otharwlaa  It  map  sat  op 
Into  the  syntax  node,  Tha  working  mods*  of  lp 
arw  also  wot  bp  naans  of  tha  register  MDLftSQ, 

H  Stack  la  uaad  for  reearvlag  tha  return  posi¬ 
tion.  do  Staok  la  used  for  reserving  tha  DO 
statement  information  and  CALL  Stack  for  raaer- 
vlng  relevant  Information  of  local  quantities 
whan  a  call  subprogram  la  exeauted.  «hen  LP 
outputs  a  unit  hand  or  a  label,  CP  should  fill 
the  entries  or  CAMU  and  CAUL  respectively  for 
U»l»  ups  In  some  control  statements  concerned. 

DP  oonalata  of  DAM  (Data  Associative  Memory), 
sane  staok*  (UP  Stack,  T  Staok,  L  stack  and  F 
Staok)  and  register  TSMFT.  In  tha  syntax  mode 
for  uon-exeoutable  statements  (declaration  part) 


It  fills  tha  corresponding  entries  of  cam  for 
the  variable*  and  array*  but  doss  not  allocate 
nny  oalls  in  CM*  (except  cOMiOM  Statements) . 
for  executable  statement#  no  treatment  should 
a*  naoaaaarys  it's  a  matter  of  starting  tne  lP 
by  OF  to  continue  tha  eoannlng.  Now  aw  DF  Is  in 
tha  operating  mode,  it  not  only  fills  the  cor¬ 
responding  entries  of  DAM,  but  also  allocates 
cells  in  LSI  for  them.  Than  It  caloulatea  the 
tnluao  of  #ieou»abl*  statements  and  assigns 
▼slues  to  them.  The  register  /DMFt  points  to 
tha  first  usable  location  of  tha  free  apace  in 
DM.  (After  returning  of  tha  called  subprogram 
tna  apace  in  DU  allocated  to  it  should  be  re¬ 
leased  for  other  uses.)  LXF  Stack  storo*  op- 
ratora.  V  dteok  store*  the  value*  of  operands. 

L  Stuck  store*  tna  logionl  operand*. 

aaaidaa,  LAI  is  tna  Date  Memory;  data  stored 
in  It  nrn  tagged  to  Indicate  the  type  of  data, 
aoeonar  pointer  SP  1*  a  pointer  whloh  points  to 
tb*  location  of  the  oharaotor  being  treated  in 
HI.  Tha  MULT  register  stores  the  operating 
results  to  control  the  DO  and  ip  stKtamants. 

111.  Dlraot-axaoutloa  of  aoaa  statements 

Before  axaeuting  wa  assume  that  th*  user's 
program  la  stored  In  PM  already.  The  pointer  gp 
points  to  th*  first  eharaoter  of  tbs  program  in 
PM  and  LP  la  In  th*  ssannlag  «od* . 

1.  Treatment  of  unit  head  statements 

•baa  LP  aoaaa  the  unit  hand  tag,  it  is 
changed  to  the  executing  mode,  spalling  th* 
eharaotara  Into  tokana  to  b*  output.  CP  rsoaivas 
and  analyses  th*  token*  to  determine  tb*  type, 
class  gad  asms  of  tbla  program  unit.  Than  It 
tills  than*  Items  Into  th*  eorraapondlng  field* 
of  global  CAMS  ns  shown  In  Fig.  2,  whom  DamPT 
point*  to  th*  flret  looatlon  of  th*  loeel  quan¬ 
tities  in  DAM,  HUT  points  to  th*  first  charac¬ 
ter  looatlon  of  tb*  flrat  executable  statement 
of  thla  unit  lo  Hi,  LPT  points  to  tha  looition 
of  th*  flrat  label  of  this  unit  In  CAML*  Thai# 
pointers  should  b*  filled  before  th*  unit  is 
called. 

If  thla  unit  la  a  function  subprogram,  dp 
should  be  activated  to  fill  It*  name  Into  th* 
DAM  of  this  unit  as  shown  in  Pig.  3.  Than  tha 
location  of  tha  flrat  entry  in  DAM  of  this  unit 
should  b*  put  into  tha  field  DAMPT  In  CAMU. 
finally,  there  should  b*  left  n  blank  bataaan 
th*  two  nwlgboring  units  In  DAM,  OAML  “to,  to 
indlflnt*  tha  and  of  th*  unit*. 

Por  tb*  subprogram  with  dummy  nrgumnnta, 
•ftar  OP  raeogninan  n  airway  argument,  It  acti¬ 
vates  DF  to  fill  tha  entries  of  this  block 
auoaaalvaly  and  put  n  dummy  symbol  in  tp#  f}a)d 
Dummy.  Vbas  It  anoountara  tb*  character  ")*  It 
fills  th*  cumber  of  dummy  argument*  lo  th*  field 
SIU. 


10 


-1'  1  I  T  '-.v..— '.  ■  '»'  ■■  l,M*»  M  I  IWtflW^W 


l‘ 


1: 

j 


2.  Treatment  of  declaration  etatenants 

,Vh*n  CP  encounter*  tba  name*  of  variables  or 
arrays  of  tha  non-axaoutebl#  statements,  it  puta 
thaw  ioto  tba  D8R  (Data  Swaroh  Haijlatar)  and 
than  activates  DP  to  find  out  ahathar  thara  ar# 
•Mh  names  in  DAM  or  not.  If  thara  arai  DP  fill* 
tba  corresponding  fialda.  If  thara  aran't  it 
•llotfatwe'ae*  antriaa. 

Tha  fiald  STJWCTURI  lndicataa  that  tba 
struotura  of  tba  ana*  ia  a  variable,  an  array  or 
a  function.  Tha  fiald  IKB  indieataa  that  it* 
ty pa  ia  a  raal  or  an  integer;  tba  fiald  COttKM 
indieataa  what  bar  it  la  alloeatad  in  tha  cdMCN 
ration  or  not;  and  tba  fiald  SIZE  indieataa  tha 
volume  of  array  or  tba  a  natter  of  tba  du any 
argument*.  Baaidaa,  tha  fiald  171  pointn  to  tha 
location  of  tha  variable  (or  tha  looation  of 
tba  firet  eoaponant  of  tha  array)  atored  in  tba 
OK.  In  our  schema,  for  all  tha  quantities  of 
non-exaoutabla  statement*  except  thoaa  apwoi- 
fled  by  tha  caMOH  atatamanta,  »a  do  not  allo- 
oata  any  cello  in  DM,  that  ia  to  any,  wa  do  not 
fill  TT1  until  thia  unit  la  oallad  by  anotbar 
program  unit.  Of  couraa,  for  theau  quantitla# 
in  tha  aaln  program  call#  are  allocated.  The 
pointer  PT8  atorea  tha  inforamtlon  for  oalcu- 
latlnc  tha  location  of  tha  oompouanta  of  an 
nrrty,  an  it  la  filled  for  arraya  only. 

3.  Treatment  of  tha  statement  function 

Tha  treatment  of  tha  atateomt  function  ia 
to  fill  ita  naaa  and  dummy  argument#  together 
with  their  type#  into  tha  UiM  but  not  to  allo¬ 
cate  any  call#  in  tha  DM.  «hen  DP  encountere 
tha  token"*" ,  tha  oontant  of  8P  ahould  be 
filled  in  tha  fiald  PTJ  in  tha  DiM  and  than  OP 
aata  LP  to  Bonn  until  tha  terminal  symbol  of 
thia  statement  ia  encountered. 

4,  Treatment  of  tha  Definitional  Labal 

Bafore  encountering  tha  first  executable  atat 
statement  of  the  main  program,  DP  la  in  tha 
syntax  mode,  i.a.  it  only  traata  the  noa- 
exsotabla  atatamanta  ••  diacuaaad  above  abila 
for  tha  executable  atatamanta  it  treat*  tba 
definitional  labal  only. 

Tha  treatment  of  the  definitional  label  1* 
to  fill  the  label  in  the  entry  of  the  CiML  of 
ita  unit  according  to  the  sequence  If  it* 
appearing  in  tha  program  aa  ahown  lo  Pig. 4, 
ebare  POTT  (Program  Memory  pointer)  point!  to 
tha  location  of  tba  firet  character  of  tba 
statement  of  this  label  in  W.  DL  lndicataa  tha 
number  of  nesting  of  DO  Loops  and  is  used  for 
preventing  tha  program  to  transfer  into  inner 
layer  of  tha  loops. 

Mr  tha  executable  atatamanta  of  tha  subpro¬ 
gram,  LP  la  sat  ib^ilis  scanning  mode  to  scan  tha 
labal  region  and  the  firet  character  of  tba 
statement.  Now  If  tha  labal  la  encountered,  CP 
fill*  one  entry  of  CAML  and  "0"  in  ita  DL  field 
to  iodieate  that  tha  labal  ia  not  in  any  loop 


body. 

If  tha  flrat  character  of  tha  statement  is 
not  tha  alphabet  "p",  than  tha  LP  ahould  scan 
continuously.  If  it  is,  tha  If  ahould  output 
a  toksn.  urban  CP  doea  not  gggaumt ar  tha  keyword 
"DO",  it  aata  LP  to  scanning  mode  again;  if  it 
doaa  ancountar  "DO*  the  following  statement 
ahould  ha  a  DO  statement  suchas: 

Do  L  1  -  ,  mj.  , 

Then  OP  puabaa  the  labal  of  the  terminal  state¬ 
ment  l  into  tba  fiald  XL  of  tha  Do  ataok 
(remains  tha  other  fialda  blank)  and  pushes  the 
return  location  (l.a.tba  first  character  of  the 
loop  body)  into  tha  R  ataok  aa  ahowo  la  plg.$. 

whan  a  definitional  labal  is  encountered  CP 
fills  an  entry  of  CAML  and  "0"  into  ita  field 
DL  as  wall.  Man  tba  labal  of  tha  terminal 
statement  L  la  anoountarad,  baaidaa  filling  it 
into  tna  C4ML,  tha  DO  ataek  and  R  ataok  ahould 
ba  popped,  tha  value*  of  tba  DL  of  all  labaia  ; 
within  thia  Do  loop  ahould  b*  increased  by  "1", 

yor  multi-nested  DO  Loops,  say,  with  thraa 
aastad  layer*,  uftar  5P  transfers  out  of  all  tha 
DO  loops  tba  DL  value  of  tha  innermost  layer  la 
3,  that  of  tha  middle  lMyer  la  2  and  that  of  tha 
outsrmoat  layer  la  1.  Tn*  treatment  of  defini¬ 
tional  labels  In  tha  main  program  will  ba  dis¬ 
cussed  in  tha  next  paragraph. 

5.  rraatmant  of  do  statements. 

•ban  DP  is  in  the  operating  mode  to  axaeute 
tha  do  statement  "DO  Li*  at,,  ■«»  if  tha 
tsrmlnvl  statement  label  L  la  found  in  the  CAUL, 
the  HOT  of  L  is  pushed  into  tha  field  TL  or  the 
DO  ataek,  m,  is  assigned  to  1,  and  tha  locations 
of  t,  a.  and  au  ara  pushed  into  tha  fields  qpf 
(Control  variable),  VP  (Pinal  parameter)  and  ip 
(Incremental  imrameter)  of  DO  ataek  respectively . 
Tha  return  location  la  pushed  into  R  steak 
(rtaturn  trtaok)  also,  aa  shown  in  Plg.J.  Labels 
which  are  found  in  th*  CAML  with  addrasaa*  both 
lana  than  or  equal  to  that  of  tha  terminal  labal 
L  and  greater  than  or  equal  to  that  of  the  R 
ataek  ar*  within  tha  Loop  body.  Than  tha  values 
of  th*  field  DL  of  all  th*  labaia  within  the  loop 
body  ahould  be  decreased  by  "1".  The  values  of 
HL  of  this  layer  no*  equals  to  ”0",  which  Indi¬ 
cates  that  tbasa  labels  ar*  in  tba  same  layer, 
so  that  they  may  b*  transfarad.  Whan  tha  Mated 
DO  statement  ia  anoountarad  OP  goes  through  all 
th*  procedure  as  discussed  above.  Having  executed 
th*  terminal  statement  of  tha  DO  loop.  OP  activa¬ 
te*  DP  to  ealculat*  i  •  1  ♦  m,  and  RtgCLT  •  l  - 
m,  .  If  RESULT*  0,  than  tba  tip  items  of  R  Stack 
anouid  o*  copied  into  SP  to  make  tha  loop  axaeu- 
tion  again,  if  tha  HESULfrO,  the  loop  axaeution 
i*  completed,  the  value*  of  th*  field  DL  ef  all 
the  labels  within  th*  loop  body  should  b* 
increased  by  ”1",  It  indieataa  that  tbasa  labels 
witfels  this  loop  should  not  ba  tranafarad.  Than 
tha  DO  ataek  and  R  ataek  should  b*  popped. 
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If  L  is  not  found  in  CAUL  ths  statement* 
in  tbs  loop  body  should  bs  executed.  When  ths 
definitional  labels  ars  encountered,  CP  allo¬ 
cates  the  entries  of  CAUL  to  them  sad  fills  the 
fields  OL  «lth  "0".  Hating  executed  the  terminal 
statement  of  the  DO  loop,  treat  them  as  discus? 
aed  above. 

6.  Treatment  of  GOTO  and  IF  statements 

If  the  coTO  L  statement  is  not  in  any  DO 
Loop,  and  the  label  L  is  found  in  the  corres¬ 
ponding  region  of  CAML,  CP  cheeks  the  value 
of  the  field  DL;  if  it  is  "0",  the  program  may 
transfer  to  the  label  L,  otherwise,  an  error 
hao  occurred.  If  the  label  L  is  not  found  in 
the  corresponding  region  of  CAML,  CP  seta  LP 
and  DP  into  scanning  and  syntax  modes  respec¬ 
tively,  scanning  tha  program  to  find  the  L.  The 
treatment  is  similar  to  paragraph  A. 

Whan  the  L  la  found  in  the  program,  in  or¬ 
der  to  prevent  transfer  into  tha  inner  layer 
of  do  loop  from  tha  out  layer,  the  L  should  not 
be  transferred  immediately  (although  the  value 
of  its  fields  DL  is  "0"  at  that  time).  Tha 
loostlon  of  L  In  CAML  should  be  otored  in  tha 
temorary  register  TP.  LP  scans  the  program 
contlnuoaly  until  it  returns  to  tha  same  layer 
of  thie  GOTO  L  statement,  i.e.  the  DC  loop 
layer  whose  fields  OT,  PP  ana  11  in  the  DO 
stock  are  blank  should  ba  scanned  out  uuu  tha 
values  of  field  DL  should  be  iucroaaud.  Tnen 
the  values  of  the  field  HOT  and  DL  In  CAMU 
should  be  found  out  by  means  of  thu  content  in 
TH,  If  the  value  of  the  field  DL  is  "0"  then 
the  program  transfers  to  Li  otherwise  an  error 
has  occurred. 

For  the  0010  L  statement  lying  in  a  certain 
nesting  do  loop  layer,  It  ia  naoeasary  to  find 
out  L  within  tha  currant  nested  layer  of  CAML. 

(If  It  la  not  found  in  CAML  LP  should  be  set 
in  the  scanning  mode  *6  Mlh  tha  prbgrW 
to  find  tha  L  of  this  DO  loop  layer  in  the 
program  aa  discussed  above).  If  L  is  found  nod 
its  DL  value  is  "0",  the  program  should  ba 
transferred  to  L.  If  L  la  not  found,  the  values 
of  the  field  DL  of  ell  the  labels  within  this 
loop  layer  should  be  increased  by  "1".  CP  pope 
the  top  of  DO  stack  and  R  ataok;  goes  on  to 
find  the  label  in  the  outer  layur  (in  the  CA*tL 
or  in  the  program).  Tha  above  proooee  la  re¬ 
peated  egaln  and  again  until  the  L  is  found 
and  tha  program  is  transferred  to  It. 

The  execution  of  IF  statement  IF  (e)  L,, 

L-t  L,  is  similar  to  tha  GOTO  statement.  When 
CP  recognizee  the  keyword  "IP"  it  aetlvatea 
DP  to  oaloulata  the  expression  and  puts  its 
logloal  raault  (leas  than,  equal  to  or  greater 
than  zero)  into  tha  RASULT  reglbter.  Than 
aooordlng  to  this  result  CP  puts  tha  value  of 
the  corresponding  HOT  of  L.,  L  or  L  into 
sp  to  perform  tha  transfer. 1  * 

7.  Treatment  of  tha  call  of  statement  function 

12 


In  the  case  of  celling  e  function  subprogram 
or  a  statement  function,  it  ia  necessary  to  find 
the  name  of  tha  function  in  the  region  of  the 
current  operating  program  unit  of  the  DAM*  If 
this  name  ia  found  it  ia  a  statement  function; 
otherwise,  it  may  be  a  funotlon  subprogram.  For 
a  function  subprogram,  lta  noma  aould  be  found 
out  in  the  CAMU. 

CP  coplea  the  values  in  the  flela  rvPT, 

DAMPT  and  LPT  of  tha  CAMU  into  tha  fields  of 
tomporary  register.  CP  ellooetes  a  oall  in  DM 
for  tha  function  name  to  atore  values  of  the 
function.  Then  the  CP  recognizee  tha  aotual 
arguments,  say,  there  are  three  arguments:  a 
(variable),  3  (constant)  and  C  ♦  D  (expression) . 
CP  aetlvatea  DP  to  find  out  (by  DAM)  the  loca¬ 
tion  in  DM  allocated  for  A*  The  locations  of  a 
and  thoaa  of  temporary  calls  allocated  for 
constant  3  and  tha  raault  of  expression  (c  ♦  D) 
together  with  the  location  of  tha  funotlon  a no* 
should  be  copied  into  the  fields  Prl  of  dummy 
arguments  and  tha  function  turns  of  the  colled 
subprogram  in  DAM  respectively.  Then  DP  pushes 
tha  value  of  the  field  PT1  of  the  function  name 
into  P  stack,  ns  shown  in  Fig. 6. 

In  our  scheme  we  use  csll  by  name.  (.‘urtslnly 
during  the  process  of  substitution  some  syntax 
checking  (ua  on  whether  the  numoera  of  the 
actuul  anu  dummy  argument »  ere  equal,  whether 
the  types  of  both  arguments  are  the  same,  etc.) 
should  ba  made.  Whan  the  character  *)"  hne  been 
treated,  the  return  location  in  11’.  (the  value 
of  sp)  should  be  pushed  into  R  stack,  the  values 
of  DAmPT  ami  LPT  of  TH  and  thr,«  of  FDMPT  pushed 
into  tha  nnl.t.  staok  as  shown  in  Flg.$.  The  value 
of  ttUT  of  CAMU  in  the  temporary  register  should 
be  put  into  the  pointer  OP  to  perform  the  trans¬ 
fer.  In  executing  the  executable  statements  of 
the  fuaotion  subprogram,  DP  should  allooate 
calls  ia  EM  for  local  variablaa  which  have  not 
been  allocated  yet,  and  should  modify  FDMPT  also 
.  once  tha  {UtTMHN  statement  is  encountered  CP 
puts  the  value  of  R  stack  into  9P,  pope  the  CALL 
stack  and  R  stack  and  clears  all  tba  fields  DTI 
of  tha  DAM  of  this  subprogram.  Tha  calculated 
result  is  now  automatically  available  in  the 
cell  of  the  function  name. 

The  cell  of  a  statement  function  is  very 
si jl liar  to  that  of  the  funotlon  subprogram  but 
the  value  of  1T3  of  tha  statement  fouction  in 
the  DAM  should  ba  put  into  SP  Instead  of  HOT 
in  CAMU  of  tha  fuaotion  subprogram,  a*  the  seme 
time,  it  is  not  necessary  to  alter  the  CALL 
Stack. 

Sinoe  tha  call  of  a  subroutine  statement 
is  preceeded  by  the  keyword  "CALL",  it  is 
easier  to  recognize.  Ths  treatment  of  the  cell 
subroutine  is  rather  similar  to  that  of  a 
l'unctloo  subprogram. 


IV.  Conclusion 


Tbs  language  Fortran  has  bssn  In  uss  for 
many  ysars  In  aclantlfle  computation  and  is 
i'omiliar  to  aoat  ooeiputer  ussrs.  Jincs  however, 
vaults  a  lot  of  trouble  is  involved  in  tha  usa 
of  tha  language  Fortran  for  direct-executing 
sa  havs  to  aodlfy  it  propsrly  in  Assigning  tbs 
high-level  language  computer. 

Today,  tba  coaputar  hardware  and  tbs  eompu- 
tar  software  hold  a  relation  of  mutual  Impetus, 
mutual  penetration  and  mutual  constraint.  The 
development  of  tha  coaputar  language  and  prof, 
g ramming  bsa  greatly  affected  coaputar  archi¬ 
tecture  ,  as  la  shown  in  tba  improvement  froa 
classical  ooaputar  architecture  to  high-level 
language  coaputar  architecture.  On  tba  other 
band,  tha  development  of  coaputar  orehlteeture 
also  loads  to  a  dsvslopaant  of  languages,  suoh 
as  tha  him  language  proposed  by  rrof.  Yaohan 
Chu. 

To  aua  up,  tba  devalopmant  of  the  high-lev 
level  language  coaputar  snould  land  to  a  close 
merging  of  tha  prograaaing  language  and  the 
computer  architecture j  that  la,  tha  language 
and  tha  coaputar  architecture  ought  to  execute 
tha  prograa  effectively  and  tha  programing 
also  ought  to  satisfy  tha  re<iuirmenta  of  tha 
language  and  architecture,  so  aa  to  improve  thu 
reliability  and  practicality  aa  well  aa  the 
cost-efficiency  of  the  whole  eystea.  So  la  the 
long  run.  It  la  necessary  tc  reconsider  and 
redesign  nee  language  from  tha  point  of  view 
of  prograaaing  and  coaputar  architecture. 

Indeed  the  conception  of  structure  prograaaing 
ana  structured  language  has  appssred  already, 
but  tba  languages  evolved  are  not  solely  dedi¬ 
cated  to  tha  high-level  coaputar 

Since  the  aald  HIM  language  has  not  son 
aids  neesptenes  yst,  ws  think  it  nscsssary 
to  dssign  soaa  nsw  coaputar  arr  .oturs  for 
ths  currant  language  such  ss  Ft  an,  Cobol 
ate.  This  Is  our  motivation  in  'iting  this 
paper. 
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Abstract 

This  paper  reports  a  JOVIAL  direct- 
execution  machine  which  accepts  a  subset  of 
the  JOVIAL  J71  language.  It  desctlbes  the 
J73  subsat.  It  also  describes  the  organization 
of  the  JOVIAL  direct-execution  computer  which 
reflects  the  language  constructs  of  the  J73, 
Therc  are  3  processors,  3  associative  memories, 
a  program  memory,  a  data  memory  and  10  Inter¬ 
facing  registers.  The  mcmory/reg I ster/stack 
structures  and  direct-execution  algorltlma 
of  the  processor  are  described . 


1.  Direct  Execution  Computer 

Direct-execution  refers  to  the  operating 
mode  of  a  high-level  architecture.  Th  1  ?t  oporat  Inn 
mode  directly  accepts  and  executes  n  high-level 
language  program  without  the  need  of  multiple 
layers  of  conventional,  software.  As  a  result, 
there  is  no  compiler,  no  assembler,  and  no 
linkage  editor.  The  high-level  programming 
language  1b  the  machine  language  that  the  bare 
hardware  recognizes.  A  direct  execution  computer 
Is  capable  of  operating  in  the  dlrect-exeijution 
mode. 

The  direct-execution  computer  (uj  Is 
structured  with  a  direct  execution  cvclei  this 
is  shown  In  Fig.  1.  A  hiRh-order  language 
Program  Is  stored  In  the  program  memory.  The 
lexical  processor  fetches  the  next  token  from 
the  program  memory  and  delivers  the  token  to 
the  language  processor;  the  language  processor 
execute*  the  token  accordingly.  This  cycle 
continues  until  the  program  ends. 

# 

I'he  direct-execution  computer  11,7,8]  is 
organized  to  reflect  the  constructs  of  a  hlRli- 
level  programming  language.  The  organ izat ion 
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is  shown  in  Fig.  2,  where  there  are:  a  program 
memory  PM,  a  data  memory  DM,  three  aaaociative 
memories  (SAM,  CAM,  and  "AM),  and  three  proceaaors 
M  ex  leal  processor  I.P,  data  processor  DP,  and 
control  processor  DP).  The  program  memory  stores 
the  source  program.  The  data  memory  stores  the 
data  values.  The  associative  manorles  store 
descriptors  which  represent  the  data  and  control 
Information  In  the  source  program.  After  initial¬ 
ization,  the  control  processor  fetches  the  next 
token  from  the  lexical  processor  which  has  access 
to  the  program  memory.  It  then  either  executes 
the  token  or  activates  tho  data  processor  to 
execute  It.  This  process  of  direct-execution 
token-by-token  continues  until  the  source  program 
reaches  the  end. 

This  paper  describes  a  JOVIAL  direct-execution 
computer,  wjrlch  makes  use  of  the  above-mentioned 
direct-execution  organization. 

2.  A  JOVIAL  Machine 

The  JOVIAL  computer  in  this  paper  is  designed 
for  a  subset  of  the  revised  MIL-STD-1589A  (DSAF) 
definition  of  the  upgraded  J73  JOVIAL  programming 
language  dated  MARCH  15,  1979  [11]. 

2.1  A  .173  Subset 

The  .173  is  u  dialect  and  an  outgrowth  of  the 
AT, COL  AD  programming  language  [10].  As  a  result, 
the  J73  retains  n  great  deal  of  the  ALGOL  60 
language.  It  is  a  complex  compiler-oriented 
language.  A  subset  of  J73  is  chosen.  There  are 
46  syntactical  statements.  The  syntactical 
constructs  are  outlined  below. 

(a)  Program  Structure 

The  subset  allows  the  complete  program  to 
have  a  main-program  module  and  zaro  or  aiore 
procedure  modules.  Tha  main  program  module  must 
he  the  first  module.  Its  construct  is  shown  below. 

START  PROGRAM  <  name  >;  <  program  body  >  TERM 

The  construct  of  the  program  body  is  the  same  a* 
the  procedure  body  except  the  former  permits 

d  Irert ives. 


Cb)  Declaration* 

There  are  four  types  of  declarations:  item, 
table,  external,  and  define.  The  first  two  declare 
th*  data  elements,  while  the  third  declares  a 
procedure  module.  Th*  last  Is  a  macro  for  text 
substitution.  The  declarations  may  be  enclosed 
by  a  pair  of  'BEGIN'  and  'END'  to  become  a  block 
declaration. 

(c)  Procedures  and  Function* 

There  la  both  procedure  declaration  and 
procedure  definition.  The  procedure  fleclaration 
is  for  use  .,u  the  external  declaration.  When  a 
procedure  definition  Is  enclosed  by  a  pair  ol 
^starved  words  'START*  and  'TKRH',  it  becomes  a 
^procedure  module.  It  permit*  formal  parameters. 
Thar*  is  on*  function  'FLOAT  (<numb*r>)'  which 
Converts  an  Integer  Into  a  floating  number. 

(d)  Statements 

A  statement  can  be  simple  or  compound.  There 
ere  four  types  of  simple  statements:  assignment, 
Joop  (or  FOR-statemsnt) ,  IF,  and  procedure-call. 
.Statements  may  be  enclosed  by  e  pelr  of  'BEGIN' 
and  'END'  to  become  a  compound  statement. 

fe)  Formulas 

There  are  three  types:  integer,  floating, 
ii ml  boolean.  An  Integer  formula  represents  an 
integer,  while  a  floating  formula  represents  a 
floating-point  number.  There  are  four  operators 
and  '/')  for  both  Integer  and 
floating-point  operations.  A  boolean  formula 
represents  a  value  of  true  or  false.  There  are 
nix  relational  operators. 

U)  Data  References 

There  are  two  types  of  data  references: 
.ii.lalijt  and  function-calls.  A  variable  can  be 
an  Item  or  a  table.  As  mentioned,  there  Is  only 
one  Intrinsic  function. 

(g)  Lexical  Elements 

There  are  56  characters  which  are  grouped 
lino  26  letters,  10  digits,  end  20  marks.  There 
uro  6  basic  lexical  elements:  token,  comment, 
define,  and  trace.  A  token  can  be  a  name,  a 
number,  a  floating-literal,  or  an  operator.  There 
lire  34  operator*  which  Include  IS  reserved  words. 

A  comment  le  a  string  of  characters  enclosed  by 
a  pair  of  quotation  marks.  A  dsfln*  has  as  its 
body  also  a  string  of  charactsrs  snclossd  by  a 
pair  of  quotation  marks.  Only  one  directive  1* 
permitted;  this  directive  has  *s  its  body  a  series 
of  name*  separated  by  commas. 

2.2  A  Sampla  Program 

A  J73  sample  program  la  shown  In  Fig.  3.  The 
line  number*  are  not  a  part  of  the  program;  they 
are  used  for  referencae.  Th*  numbers  are  the  same 
for  the  two  parts  of  th*  program;  they  are 


distinguished  by  being  referred  to  aa  upper  lines 
end  lower  lines. 

The  upper  lines  00000  to  02000  Indicate  the 
start  and  ths  termination  of  th*  complete  program. 

The  complete  program  consist*  of  a  main  program 
module  and  a  procedure  module.  Th*  main  program 
sttdula  conslsta  of  program  name  TSIJ0V  (upper  line 
00100) ,  program  body  (lines  00200  to  01900) .  The 
program  body  ha*  an  external  declaration  (upper 
line  00300)  of  Proo  TRIG  which  includes  a  block 
declaration  of  itsu  ANG,  SANG,  and  CANG  (upper 
lines  00400  to  00800) ,  three  declarations  of  tables 
DEG,  SS1N,  and  CC0S  (upper  llnaa  00900  to  01100), 
a  declaration  of  item  II  (upper  line  01200),  a 
directive  of  TRACE  (upper  line  01300) ,  and  a  FOR 
statement  (upper  line*  01400  to  01900) .  In  this 
FOR  statement,  there  le  a  call  of  procedure  TRIG 
(line  01700). 

Tha  procedure  module  begins  and  terminates 
at  the  lower  lines  00000  and  02200,  respectively. 

It  haa  e  procedure  heading  (lower  line  00100)  and 
a  procedure  body  (lower  lines  00200  to  02100).  The 
procedure  body  haa  e  define  declaration  (lower 
line  00200),  declarations  of  six  items  (lower  lines 
00300  to  00800),  e  comment  (lower  lines  00900  to 
01000),  three  assignment  statements  (..lower  lines 
01300  to  01500),  and  a  FOR  statement.  Tha  controlled 
statement  of  the  FOR  statement  la  a  compound  state¬ 
ment  which  consists  of  an  assignmsnt  statement  atul 
un  IK  statement. 

2.3  Computer  Organisation 

The  organization  of  JOVIAL  direct-execution 
computer  (13)  Is  developed  from  the  direct-execution 
computer  organization  In  Fig.  2.  It  Is  shown  In 
tho  diagram  In  Fig.  4  where  there  are  the  following 
computer  elements: 

(a)  3  processors:  LP,  CP,  and  DP, 

(b)  3  associative  memories:  SAM,  CAM,  and  DAM 

(c)  2  random  access  memories:  PM  and  DM 

(d)  2  cables  in  ROM, 

(e)  10  Interfacing  registers,  and 

(f)  main  buB 

The  memory/register/stack  structures  of  the 
1  processors  together  with  the  interfacing  registeru 
are  shown  in  Fig.  5.  The  functions  of  these  10 
interfacing  registers  are  described  below. 

(a)  Register  SPTR  points  to  a  character  in  Program 
Memory.  It  1*  of  special  Importance  when 
marking  the  location  and  other  unique  pointers 
of  control  statements  and  procedure  modules, 
th*  bodies  of  define  declarations,  snd  ths 
return  position  from  procedure  and  define  calls. 

Except  for  the  very  first  call  for  e  token, 

SPTR  1*  set  at  the  first  character  of  th*  next 
token  to  be  formed  when  *  token  la  requested. 
After  the  token  hae  been  formed  by  the  Lexical 
Processor,  SPTR  le  advanced,  if  necessary,  to 
point  to  the  first  character  of  the  next  token. 


(b)  Register  TOKEN  hold*  the  lot  token  formed  by 
the  LP.  Thla  register  Is  referenced  by  nearly 
every  sequence,  since  the  tokens  define  the 
progress 

(c)  Register  TYPE  stores  the  type  of  a  name.  A 
Haas  may  be  a  reserved  word  ('R'),  pseudo¬ 
function  call  ('FLOAT'),  trace-directive  naes 
('DIR')  or  identifier  ('N'). 

(d)  Register  BLOCK  atores  the  top  entry  of  tha 
Control  Processor's  BSTACK.  It  Identifies 
the  eodule  which  tha  program  Is  currently 
executing  so  scope  checks  can  be  made  on 

declared  names. 

(c)  Register  BACK-PTR  saves  the  position  of  the 
first  character  of  the  current  token. 

(f)  Register  D-LEV  stores  the  top  entry  of  'lie 
Lexical  Processor's  RETURN  stack.  (An  empty 
stack  gives  D-LEV  a  value  of  sero.)  This 
value  identifies  a  specific  define  call  or 
that  there  la  no  active  define  call  (It  can 
be  considered  a  'def ine-activat ion-level') . 

The  register's  purpose  la  to  protect  tha  CP 
from  creating  CAM  entries  for  control 
statements  whose  Program  Memory  Pointers 
are  In  different  define  bodies. 

(g)  Register  DKF-DCL  is  n  flag  which  Identifies 
whether  a  define  declaration  is  permitted  or 
not.  Define  declarations  follow  all  the  rules 
associated  with  other  declarations. 

(h)  Register  DEF-CALL  Is  a  flag  which  Identifies 
whether  or  not  a  define  call  la  permitted. 

A  define  call  Is  not  allowed  when  the  next 
token  expected  le  a  module  name  or  declaration 
name. 

(I)  Register  PROC-NAME  saves  the  name  of  s  pro¬ 
cedure  when  the  procedure  Is  called.  The 
register  Is  used  to  match  a  procedure  module 
name  on  the  first  call  of  the  procedure,  and 
as  a  switch  to  determine  if  the  procedure 
heading  and  declaration  list  must  be  processed 
(first  time)  or  skipped  over  (second  time). 

(J)  Register  RESULT  holds  the  Information  about  the 
value,  type  and  structure  of  formulas  and 
variables.  It  Is  used  by  tha  Data  Processor 

to  calculate  and  pass  values  to  the  Control 
Processor. 

3.  Control  Processor 

The  control  processor  CP  directly  executes 
control  constructs  such  as  conditional  branch, 
procedure  call,  nesting,  and  looping  of  the  J73 
subset.  It  also  creates  and  stores  the  control 
descriptors  In  the  control  associative  memory 
CAM.  These  control  descriptors  can  expedite  the 
repeated  execution  of  stateaencs  In  a  program 
loop  without  the  need  for  repeated  syntactical 
processing. 


A  .173  program  specifies  a  sequence  o t  data 
operations.  The  sequencing  le  dpeelfled  by  control 
atatemumta.  The  control  proceeeor  recognises  the 
control  reserved  words  and  than  manipulates  tha 
pointer  In  register  SPTR  (which  polnta  to  the  next 
character  In  execution  of  the  source  program)  of  the 
LP  proceeeor  to  carry  out  thn  sequencing . 

Tha  structure  of  tha  CP  consists  of  one 
associativa  memory  and  5  stacks  as  shown  In  Fig.  5. 
The  functions  of  theaa  memory  and  atacks  are 
described  below. 

(a)  Control  Associative  Memory  CAM  la  to  spaed  up 
statement  execution  by  saving  critical  Infor¬ 
mation  about  control  statMenta  and  procedure 
modules.  There  are  three  typsa  of  CAM  entries: 
If-atatament,  loop-statement,  end  procedure- 
module.  The  type  of  entry  Is  stored  in  the 
Type  field.  Information  for  control  statements 
consists  of  flalda  for  tha  location  of  tha 
statement  (for  Identification),  alaa-part 
pointer  for  lf-etatemente ,  Increment  formula 
pointer  for  loop-statements ,  and  an  exit  pointer 
to  point  to  the  token  following  the  etstemsnt. 
Procedure  CAM  entries  atora  tha  name,  location* 
fomal-parametar-llst  pointer,  and  body  pointer 
of  procedure-modules. 

The  CAM  entries  for  soma  control  statements 
composed  of  deflna-calla  cannot  be  made. 

Unless  the  Program  Maswry  Pointers  of  a  control 
statement's  CAM  entry  ere  ell  on  the  seme 
'define-activation  level',  the  proper  stack 
management  staps  of  daflna-calla  and  raturna 
may  not  ba  followed  whan  SPTR  la  Jumped.  Whan 
this  type  of  statement  la  encountered.  It  will 
always  ba  treated  as  a  'first-time' ,  no  GPTR 
Is  adjustsd  by  rspsstsd  cslls  of  ssqusncs  NEXT-  . 
TOKEN. 

(b)  Stack  BSTACK  navss  ths  body  polntsra  of  program- 
body  and  active  procedure  modules.  Each  entry 
uniquely  Identifies  a  nodule.  Tha  top  entry 

of  BSTACK  la  stored  In  register  BLOCK  to 
Identify  the  currently  executing  module. 

In  addition,  tha  positions  of  'BEGIN 'a  before 
the  first  eimple-statemant  of  a  module  body 
are  pushed  onto  (end  then  popped  off)  the  eteck. 
This  is  necessary  because  both  compound- 
declarations  and  compound-statements  sra 
delimited  by  'BEGIN'  and  'END'. 

(c)  Stack  RSTACK  saves  the  return  position  of 
procedure  calls.  The  token  position  following 
an  executed  procedure-call-stataaMnt  la  pushed 
onto  the  stack.  It  Is  rsatorsd  Into  raglsttr 
SPTR  after  ths  procedure-body  is  executed. 

(d)  Stack  CTR-STACK'e  top  entry  serves  as  a  counter 
of  the  'BECIN's  pushed  onto  BSTACK  and  ths 
perimeters  In  a  parameter  list. 
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(•)  Stack  SPTK-STACK  haa  two  fields;  LOCN  holda  the 
location  of  an  active  control  stetsment,  and 
DLSV  holda  the  dafina-activet  ion-level  of  the 
location  pointer.  Bafora  a  control  atataaiant'a 
PM  pointer  flald  la  assigned  a  valua,  the 
current  ectlvatlon-lawal  must  equal  that  on 
the  SPTR-STACK.  If  they  are  aver  different  the 
fttatsnent' a  CAM  entry  cannot  ba  kept  -  the 
locution  flald  must  ba  araaad.  Loop-statement  a 
muat  have  their  Increment  and  exit  pointers  on 
the  same  level.  If  they  are  not,  the  statement 
Is  considered  Illegal. 

The  control  procassor  CP  diracte  the  control 
flow.  It  actlvatea  both  the  data  proceaaor 
DP  and  the  lexical  procasaor  IP.  processes 
the  following  control  conatructa: 

(a)  Program  structure 

(b)  Procedure  definition 

(c)  External  declaration 

(d)  Statement 
it)  IP  statement 

(f)  FOR  atatament 

(g)  Proc-call  atatament 

(h)  Declarations 

In  the  following  proceaaor  daalgn,  sequence 
NEXT-TOKEN,  which  fetches  the  next  token  from 
the  source  program,  la  aocecuted  by  proceaaor 
IP  aa  will  be  deacrlbed  later. 

3.1  Program  Structure 

The  program  structure  conslata  of  those  control 
conatructa  which  form  a  complete  program.  These 
are  ahown  below. 

1.  <  complete-program)  s me  in-program  module> 

l<  procedure  module> . . . ) 

2.  * main-program-module> : (“START 

PROGRAM 

<  nsme>  1 

<  program-body) 

TERM 

3.  <  program  body>  ;i"BEGIN  <  decl-llet> 

directive  >...] 
x  e tat  ament >. . . 

BID 

A,  <  procedure  module  >  i!"START 

<  procedure-definition) 

TERM 

The  above  eyntax  calls  for  the  following  hardware 
sequences. 

01  COMPLETE-PROGRAM 
02  I  MIT 

02  MAIN-PROGRAM-MODULI 
03  PROGRAM  BODY 


relationship  of  theaa  sequences.  These  sequences 
are  briefly  explained  below. 

(a)  Sequenca  COMPLF.TF.-PROGRAM.  This  sequence 
reflects  the  syntax  that  the  complete  program 
haa  ona  main-program  module  followed  by  0  or 
more  procedure  nodules. 

(b)  Sequence  1N1T,  This  aequtr>c»  seta  the  roisters 
to  aero  and  empties  the  atacr... 

(c)  Sequence  MAIN-PROGRAM-MODULE.  This  aequtnee 
Is  identified  by  three  reserved  words  anJ  n 
semicolon  as  follows, 

START  PROGRAM...;...  TERM 

This  sequence  identifies  the  main  program 
module.  It  calls  sequences  NAME  »r.d  PROGRAM- 
BODY. 

<d)  Sequence  PROGRAM-BODY.  The  prosraw  body  i»  a 
series  of  0  or  more  declarations  followed  by 
one  or  more  statements,  enclosed  by  a  BEGIN/ 

END  pair.  The  presM’ca  or  absence  of  a 
declaration  has  to  be  determined  by  the  first 
token  of  the  declaration. 

(e)  Sequenca  CAM-CHECK.  This  sequence  aearches  the 
CAM  for  an  entry  whose  name  field  la  the  same 
aa  the  contents  of  *»giater  TOKEN.  If  it  is 
not  found,  it  returns;  otherwise,  it  is  an 
error. 

(f)  P ROC-MODULE 

A  procedure  module  is  identified  by  two 
reserved  words  aa  follows. 

START... TERM 

However,  there  is  no  need  for  sequence  PROC- 
MODDLE  since  each  will  be  seerched  end  called 
aa  a  result  of  a  procedure  cell. 

3.2  Procedure  Definition 

The  procedure  definition  speciflea  a  procedure 
structure.  The  syntax  is  ahown  below. 

5.  < procedure  definition)  i «  procedure  heading); 

<  procedure  body); 

6.  <  procedure-bead lng)  ; t»  PROC  <  name  ) 

[ <  formal  paramater 
list)) 

7.  <  procadure-body)  !  BEGIN  *  decl-liat) 

[<  atatament). . . 

END 

The  above  eyntax  call  for  the  following  hardware 
sequ  ancaa 

03  PR0C-DEF 


‘i 


4 


1 


yt 

"t 

Ji 

5 


02  CAM-CHECK  /‘check  the  variables  in  CAM*/ 


04  PROC 


.3 


02  NEXT-TOKEN  /‘processed  by  LP*/ 

Moat  of  the  names  of  these  sequences  refleti 
the  terminals  or  non-terminals  of  thu  control 
syntax.  The  level  numbers  indicate  the  hierarchical 


04  DEF-HEADINC 
04  PROC-BODY 

05  PARA-GHKCKS 
05  PARA-POP 


!T.  *  W  'psspn 


Thsss  sequences  ar*  explained  balow. 

(a)  Sequanc*  PRCC-DEf.  Thia  aaquanca  call*  sequence 
PBOC-HIADIMG  and  than  calla  aaquanea  PROC-BODY. 

(b)  Saqusmc*  PIOC-tHF-HRADINC .  Sequence  P10C-DEF- 
HIADINO  chacka  the  syntax  of  the  procadure- 
headlag  and  a ate  tha  parameter  pointer  and  body 
pointer  of  tha  procedure’*  CAM  entry. 

(c)  Sequence  PROC-BODY.  A  procedure  body  la  alallar 
to  a  program  body,  except  for  two  apaclal 
eonelderatlonat  tha  formal  paraaetar*  must  ba 
declared,  and  tha  declaration  Hat  la  aklppad 
over  after  the  flrat  call  of  tha  procadura. 

(d)  Sequence  PARA-CHICKS .  Thia  aaquanca  chacka 
whether  tha  typaa  and  acructura  of  actual 

and  fomal  pa  ran  at  era  agree. 

(a)  Sequence  PARA-POP.  Thia  aaquanca  calla  DP- 
aequence  PAXA-USTORI  to  pop  each  of  tha 
procadura* a  paraaetar a  of  tha  paraaetar 
etacka  la  tha  OP  and  to  return  Into  raault 
of  output  paraaetar a. 

3.1  External  Declaration 

The  external  declaration  declare*  an  external 

procadura.  Tha  ayntax  la  ehovn  balow. 

8.  <  external-declaration*  s :«  REP  <  procedure- 

haadlns>) 

[  <  declaration*] 

The  above  ayntax  calla  for  tha  follovlnt  hardware 

aaquancaat 

01  ECTBRNAL-DECL 
02  SCAM-DICL 
02  PROC-HEADIMG 
03  PROC 

03  PP-LI8T-CHECK 

These  sequence*  are  axplalnad  balow. 

(a)  Sequence  FXTERNAL-DECL.  The  external  dec¬ 
laration  la  recognised  froa  the  reserved  word 
'REP'  which  Is  than  followed  by  a  call  of 
aaquanca  PROC -PE CL. 

(b)  Sequence  SCAN-DECL.  This  sequence  skips  tha 
declarations  of  the  external  procedure's 

paraaetar*. 

(r)  Sequence  PROC-HIADINC.  Thia  aaquanca  identifies 
procedure  naaaa  and  their  paraaetar  lists. 

<d)  Sequence  PROC.  Thia  sequence  fetchea  the  proc 
naaa  and  than  searches  for  It  In  tha  CAM.  If 
It  la  not  found.  It  craatea  a  CAM  entry  for  the 
procedure  and  Inserts  tha  naae  In  tha  entry. 

(a)  Sequence  FP-LIST.  This  sequence  counts  and 
checks  the  syntax  of  a  fomal  paraaetar  list. 


3. A  Stataaent 

A  stataaent  can  be  a  staple  stataaent  or  a 
coa pound  stataaent.  There  are  4  types  of  staple 
stataaent  a:  If,  for,  proc-call,  and  aasigmdnt. 

The  flrat  three  stataaent*  era  executed  by  tha  CP. 
The  aaalgnsant  atatmant  la  executed  by  the  «P. 

Two  additional  statsaanta,  define  and  eoaaant,  I 
are  handled  by  the  IP.  Tha  syntax  1*  shown  below., 

9.  <  stetasatot>! aiaple-statsa*nt>  1 

|  <’coapound -stataaent > 

10.  «  slaple-stat a«tt>  t !“  «aalgm*nt-stat*aaot> 

!  <loop-atat*aant> 

!  <  lf-atatasent> 

!  *  procedura-call- 
■tatas*nt> 

11 .  <  coapound-stat*aant> 

i : -BEGIN  <  sta t*aent> . .  .END 

The  above  syntax  calla  for  tha  following  hardware 
aaquancaat 

01  STMT 

02  COMPOUND-STMT 
02  SIMPLE-STMT 

Thaaa  aaquanca*  era  explained  balow. 

(a)  Stataaent.  Sequence  STMT  jails  sequence 
SIMPLE-STMT  or  sequence  COMPOUND-STMT  by  the 
absence  or  presence  of  'BEGIN'  respectively. 

(b)  Compound  Sta tenant.  Sequence  COMPOUND-STMT 
calla  aaquanca  STMT  ona  or  nora  tines. 

(c)  Simple  Statsaiant.  Out  of  tha  four  typoo  of 
simple  statanonts  the  IF  and  LOOP  statement* 
can  b*  positively  identified  by  'IP'  and  'FOR', 
respectively.  On  tha  other  hand,  proc-call 
and  assigns  ante  begin  with  a  name.  Howavar, 
the  proc-esll  statement  begins  with  s  procedure 
naxu  which  nuet  have  bean  declared  and  should 
bs  found  In  the  CAM.  If  It  Is  not  found,  tho 
nsas  Is  assumed  to  be  the  date  name  of  an 
assignment  statement. 

3.3  Loop  Statement 

The  loop  ototsmont  In  Machine  A  Is  the  POR- 
ststsnent.  It  he*  s  control  variable,  an  initial 
value,  and  an  incremental  value.  There  la 
additionally  a  whlla-claus*  which  sets  the 
condition  to  terminate  tha  looping.  Tho  oyntax 
la  shown  balow. 

12.  <  loop  statement*  u«*FOR<  variable* t<  Integer 

formula* 

BT  ■'Integer  formula* 

WHILE  <  boolean-formula*; 

*  statement* 
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fhe  above  syntax  calla  for  the  following  hardware 
sequence*. 

01  LOOP-STMT 
02  SCAM-STMT 
02  FIRST-TIKI-LOOP 
02  REPEAT-WHILE 

The  loop  stateeant  faces  3  considerations! 

(a)  looping,  (b)  nesting  of  loop  statements,  and 

(c)  f  irat-tlma  problms. 

(a)  Looping.  The  looping  require*  computation  of 
new  value  of  the  control  variable  and  evaluation 
of  the  boolean  formula.  If  the  evaluated  result 
Is  trus,  the  loop  body  Is  executed ,  end  if  the 
loop  body  la  executed  for  the  first  time,  the 
EXIT-PTR  la  inserted  In  the  CAM  entry.  If  the 
evaluated  value  is  false,  the  loop's  stateeant 
Is  scanned  and  the  EXIT-PTI  narked,  or  execution 
Is  directly  Juapad  to  KXtT-PTR. 


executed..  During  successive  tines,  no 
scanning  la  needed  since  all  pointers  have 
been  established. 

(d)  Optional  Eloe-clauae.  The  else  flag  Is 
available  in  the  CAM  entry  to  Indicate 
whether  there  Is  an  elaa-clause.  If  there 
Is,  the  elae-flag  Is  act  and  the  ELSK-PTR  is 
Inserted. 

3. ,7  Procedure  Call  Statement 

The  procedure-call  stateeant  Invokes  the 
execution  of  a  procedure  definition.  It  should 
be  noted  that  the  procedure  definition  nay  occur 
before  or  after  a  procedure-cell  etatsnent.  If 
It  Is  before,  the  location  of  the  procedure 
definition  can  be  found  from  the  CAM.  If  It  la 
aftar,  tha  program  execution  has  to  be  suspended 
and  the  source  program  It  scanned  until  the 
procedure  definition  le  found.  The  eyntax  of  the 
procedure  call  statement  le  shown  below. 


(b)  Meeting  of  Loop  Statements.  The  nesting  of  loop 
etatanenta  (end  if  statements)  le  handled  by 
pushing  its  CAM  entry  LOCK  fields  onto  stack 
SPTK-STACK  at  the  beginning  of  the  sequence 

end  by  popping  It  off  at  the  end. 

(e)  First-Time  Problem.  If  no  CAM  entry  exists  for 
this  statement,  one  must  be  created.  The 
location  and  increment  formula  position  are 
stored  In  addition  to  the  EXIT-PTk. 

3.6  If  Statement 

The  If  statement  causes  conditional  branching. 
Tha  syntax  la  shown  balow. 

13.  <  lf-atat*ment>t  i»IF  <  bool aan- formula >j 
<  atat*m*nt> 

[ELSE<  atatament>] 

Tha  If  statement  feces  S  considerations: 

(a)  branching,  (b)  nesting  of  If  statements, 

(c)  first-time  problem,  and  (d)  optional  else- 
clause.  These  considerations  ere  discussed  below. 


14.  «  procedure-call-statanent>  ::*<nam*> 

[  <  actual-par*meter-llat>] ; 

The  above  syntax  calls  for  ths  following  sequences 

01  PROC-CALL-STMT 
02  SCAN-tfNTlL-PROC 
02  PROC-DKPINITION 

The  procedure-call  statement  faces  5 
considerations:  (a)  existence  of  parameters, 

(b)  nesting  of  proc  cells  and  returns,  (c)  first- 
tin*  problem,  (d)  ahead  or  behind  e  proc  defini¬ 
tion,  and  (e)  Cell-by-value  or  by-reference. 

These  considerations  are  discussed  below.  *' 

(a)  Paraastar*.  Th*  parameters  may  or  may  not 

exist.  They  can  be  Input  or  output  parameter s 
Their  pretence  Is  determined  by  the  parameter 
count  field  of  the  procedure's  CAM  entry. 

The  DP  It  than  activated  to  execute  sequence 
ACTUAL-PARApLIST. 


(s)  Branching.  Tha  branching  requires  evaluation  of 
boolean  fonaula.  If  the  evaluated  reeult  le 
true,  tha  Then-clause  le  executed  end  the 
execution  continues  at  the  location  Indicated 
by  the  KLSE-PTR  if  it  exists,  end  otherwise  the 
EXIT-PTR. 

(h)  Heating  of  If  statements.  Tha  nesting  of  If 
statements  le  handled  by  pushing  the  LOCH  fields 
of  their  CAM  entries  onto  stack  SPTR-STACK  at 
the  beginning  of  the  sequence  end  by  popping  it 
off  at  the  end. 

(c)  First-time  or  Second-time.  During  the  flret 
time,  If  the  boolean  formula  la  true,  the  then- 
clause  Is  executed  but  the  elee-clause  Is 
scanned  by  sequence  SCAH-8TMT.  If  the  boolean 
formula  Is  false,  th*  then-clause  is  scanned  by 
sequence  SCAM-STMT,  but  the  else-c'lause  Is 


(b)  Nesting  of  Proc  Calls  and  Returns.  When  e 
proc-call  statement  le  encountered,  the  return 
address  of  th*  calling  procedure  Is  pushed 
down  onto  RSTACK,  When  the  execution  of  a 
procedure  reaches  the  and,  tha  return  address 
is  obtained  from  th*  top  entry  of  RSTACK  and 
the  entry  le  then  popped  off. 

(c)  First-time  Problem.  If  the  PROC-NAMP.  register 
is  not  «pty,  the  procedure  la  called  for  the 
flrat  tlma.  During  tha  first  time,  program 
execution  la  now  changed  into  program 
scanning  until  the  procedure  definition  Is 
found.  This  Identification  le  achieved  by 
comparing  each  procedure  name  encountered 
during  scanning  with  that  In  the  NAME  field 

of  the  top  entry  of  RSTACK.  The  scanning  Is 
don*  by  sequence  SCAN-UNTIL-PROC. 
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<d)  Second- tine  Problasi.  The  declaration  list 
of  the  procedure  body  is  not  processed  on 
succeeding  calls. 

(e)  Call-by-value  or  by- reference.  The  parameter 
passing  in  the  J73  as  follows. 

(1)  Fonsal- input  parameter:  it  must  be  an 
item,  it  is  bound  by  value. 

(2)  Formal-output  parameter:  if  it  is  an  iten, 
it  le  bound  by  value-result.  If  it  is  a 
table,  it  is  bound  by  reference. 

(3)  Actual-input  parameter:  it  can  be  an 
Integer  or  a  floating  formula. 

(A)  Actual-on*, ut  parameter:  it  must  be  a 
variable. 

The  evaluetlon  and  passing  of  parameters  are 
handled  by  the  DP. 

3.8  Declarations 

The  declaration  statements  consist  of : 

15.  «  dccl-list  >  LL-(<dsclarstion> 

!<define-declaration> 

!  BEGIN  <  d«cl-list>END)... 

16.  <  declaration* :  :»<ltem-dsclsr*tlon> 

!  <  table-dec!  aratlon* 
!<axternal -declaration* 

The  above  syntax  call*  for  the  following 
hardware  saquencee. 


01  DECL-L1ST 
02  DECL 

(a)  Sequence  DECL-LIST  processes  the  declarations 
of  a  program-body  or  ptocadura-body .  Namas 
may  not  be  declared  twice  in  the  seme  module, 
nor  duplicate  a  procadura-name.  A  dsfine- 
call  la  not  permitted  whan  tha  name  of  a 
declaration  la  tha  naxt  token  expected. 

'BEGIN'  reaervad  words  are  stacked  because 
thay  may  signal  either  a  compound  declaration 
or  compound-statement.  After  all  tha  dtola- 
tlona  have  bean  processed  SPTk  la  adjusted, 
if  necessary,  to  point  to  tha  token  which 
begins  tha  first  directive  or  statement. 

(b)  Sequence  DECL  calls  either  ITEM-DECL,  TABLE- 
DECL  or  EXTERNAL-DECL  to  process  a  declaration. 

4.  Data  Proceaaor 

A  J73  program  specifies  data  elements  in  date 
dec larat lone  end  type  declarations.  It  also  spec¬ 
if  les  date  operations  by  assignment  statement  a; 
fur  example,  the  operations  can  ba  arithmetic  or 
logical.  Whan  tha  control  procaaaor  identifies  a 
data  operation.  It  activates  the  data  processor. 

The  data  processor  DP  directly  executes  the 
date  conetructe  of  the  J73  language.  It  recog¬ 
nizee  data  and  type  declarations,  craataa  data 
descriptors,  and  atoraa  tha  data  descriptors  in 
the  data  associativa  memory  DAM.  The  data  des¬ 
criptors  in  the  DAM  allow  data  references  by 
symbolic  namas  and  permit  rapid  accaas  of  date 
values  in  the  data  meewry.  In  addition,  the  DP 
executes  assignment  statements,  evaluatee 
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Fig.  2  Organisation  of  a  Direct  Execution  Computer 


formulae,  and  handles  parameters.  The  structure 
of  tha  data  processor  DP  conalats  of  one  associa¬ 
tiva  memory,  1  register,  and  S  stacks  as  shown  in 
Fig.  5. 

(a)  The  Data  Memory  stores  the  values  of  declared 
variables.  One  word  of  storage  is  allocated 
for  each  item  or  table  element. 


variaoiaa 
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(b) 
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Data  Associativa  Memory  DAM  stores  infor¬ 
mation  about  declared  numeric  variables. 

Mama  and  Block- id  fields  identify  each 
antry.  Items  and  ana-dimensional  tables 
are  tha  only  possible  structures.  Possible 
types  are  signed  and  unsigned  integer  float¬ 
ing  real  numbers.  The  Site  field  identifies 
the  number  of  DM  words  allocated  for  the 
variable.  The  trace-id  field  acts  as  a 
flag  which  identifies  whether  tha  variable 
is  being  traced. 

Register  TRACI  is  a  two-field  register 
that  serves  as  a  flag  to  identify 
whether  a  variable  is  tha  object  of  a 
TRACI  directive.  The  FlAG  field  is 
the  switch,  and  the  Heme  field  saves 
the  nans  of  an  assignment  statement's 
variable  for  use  in  the  output  message 
which  notes  the  variables  new  value. 

Stack  SYNTAX  contains  tha  current 
syntsx  productions  being  executed. 

Stack  VSTACK  holds  the  value  and  type 
of  formulas!*  operands  sad  intermediate 
reeulta.  Operands  must  be  items  or 
t*W*. 


Dusk  PSTACK  holds  the  BM-locn  of  vsrlables. 
The  DM-locns  of  loop  statement  control 
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on  tne  stack  throughout 
sxscution  of  the  loop  atatamant.  a Inca  tha 
control  varlabla  u  chsngad  on  avary  itera- 
tion. 

(g)  Stack  OPSTACK  aavas  lowar  pracandanca  opera¬ 
tors  during  evaluation  of  a  formula. 

(h)  Stack  APSTACK  contains  aavan  flalds  which 
aava  information  about  tha  actual  parameters 
of  a  procadura  call.  Piva  of  tha  flalds 
holds  the  value,  type,  structure,  else  and 
Parameter  type  (input  or  output)  of  a 
P*r,m*ter .  In  addition,  an  output  parameter's 
DM-locn  la  saved  (so  Its  value  can  be 
returned),  and  its  nama  is  saved  if  it  is 
being  traced. 

(1)  Stack  FPSTACK  has  three  fialda  to  aava  In¬ 
formation  about  an  active  procedure's  formal 
parameter.  Tha  nama  and  parameter  type  make 
up  two  fialda.  Tha  third  (Decld)  is  a  flea 
which  la  sat  during  tha  procedure's  first 
execution  if  that  parameter  ie  declered  in 
the  aodule.  All  formal  parameters  must  ba 
declared .  Aleo,  tha  number,  type  and  struc¬ 
ture  of  actual  and  formal  parameters  oust 
natch. 


The  data  processor  DP  processes  data  declara¬ 
tions  and  controls  data  flow.  It  la  activated  by 
CP,  but  it  aleo  coamunlcetos  with  LF .  The  date 
constructs  that  are  processed  by  DP  are: 

(1)  Directive 

(2)  ltmaa  and  table  declarations 

(3)  assignment  statement 

(4)  formulas 

(5)  boolean  formula 

(6)  varlabla  and  subscript 


r  A.™/. 


(7)  (oml  paraaetere 

(8)  actual  parameters 

4.1  Dirac  tlvaa 

Tha  TRACE  directive  la  a  apaclal  atataaant 
which  diracta  a  massage  to  ba  out put ad  whenever  a 


vi t labia  In  tha  atata 
signed  a  value.  Tha  syntax  la: 


Hat  cata  aa- 


4.2 


17.  <dlrectlve>~  -  !  T14CK<naaa> , . . .  i 
Itaai  and  Table  Daclaratlona 


Machine  A  accapta  daclaratlona  In  data, 
procedure,  define,  and  block  daclaratlona.  Tha  CP 
aaacutaa  define  daclaratlona.  Tha  DP  axacutaa 
data  and  block  daclaratlona.  Tha  ayntax  of  da¬ 
claratlona  la  shown  below. 

18.  < 1  tea-dec  larat ion> "  -  ITEM<na*e><S!U'.Fh 

19.  <tabla-daclaratlon>::  ■  TABLE<neaa> 
[<dlaanelon>)<(S'.U'.F)-i 

20.  <dlnanaion>::  «  ,(<integer  fornula>) 

Tha  above  syntax  cal la  for  tha  following 
hardware  eequencae . 

4 

01  ITEM... DECL 
02  ITEM 

"a-i 

01  TABLE. . .DECL  J 
02  TABLE 
02  DIMENSION 
(2)  ltaa  Sequencer 

Itan  aaquancaa  conalat  of  aaquanca  ITEM. . . 
DECL  and  aaquanca  ITEM  which  craata  an  entry  In 
the  DAM  fro*  tha  na*a  and  attribute  In  the  ltaa- 
declaration  and  allocate  a  DM  word. 

(b)  Table  and  Dlnanalon  Sequencer 

Table  aaquancaa  conalat  of  aaquancaa  TABLE 
...DECL,  TABU,  and  DIMENSION.  Sequencer  TABU 
. .  .DECL  and  TABU  craata  an  entry  In  tha  DAM. 
Sequence  DIMENSION  calculates  the  value  of  the 
d Inane loo,  which  allows  one  dimension  and  only 
naada  an  uppar  bound  (tha  lower  bound  la  0).  This 
value  la  Inserted  into  the  else  field,  and  a 
block  of  contlngnoua  DM  words  equal  to  this  value 
ere  allocated. 


4.3  Asa  If! 


t  Stat 


An  aaelgnaant  atataaant  causae  the  value  of 
a  formula  at  tha  right  of  an  equal  sign  to  ba 
assigned  to  tha  variable  at  tha  left  of  an  equal 
sign.  A  variable  Is  a  naaa  or  a  subscripted  nans. 
A  subscript  la  an  Integer  ensloaed  by  a  pair  of 
brackets.  Tha  ayntax  la  ahown  below. 


21. 


<aeelga*ent-atata 
<f  omuls  > 


nt>:i  •  <verlable>» 


(a)  Sequence  ASSIGN-STMT 

Thle  aaquanca  calle  aaquanca  VARIABU  to 
identify  tha  object  variable,  and  than  calls 
sequence  FORMULA  to  evaluate  tha  formula.  It  tha* 
stores  the  f omuls  'e  value  Into  the  DM  location 
pointed  to  by  the  top  entry  of  FStACE. 

(b)  sequence  TRACE-CHECK 

Sequence  TRACE- CHECK  Identifies  whether  thO 
variable  In  an  aaalgnnant  atataaant  or  tha  output! 
portion  of  an  actual-paramater-liat  la  being 
traced. 

4.4  Boolean  For*ula 

A  boolean  formula  represents  a  value  of  TRltf ! 
or  FALSE.  It  occurs  In  tha  IF-elause  or  tha  MHlnJN 
clauaa.  It  can  be  either  a  fonsula  followed  by  a. 
relational  operator  further  followed  by  a  variably 
or  a  formula.  The  syntax  la  shown  below. 


24.  <boolaan-f  or*ula> :  <-<for*ula>lU<» 
t <-!>«(<! >)<lor*ula>) 


Tha  resulting  value  fro*  a  relational 


opera 

Ain 


tton  la 


either  Integer  1  for  TRUE  or  aero  for  FALSE.  The 
truth  value  of  tha  boolean  formula's  result  la 
determined  by  axaalnlng  Its  low-order  bit.  A  '1' 
la  TRUE,  'O'  If  FALSE.  This  lapleaentatlon  aakas 
off  lntegara  evaluate  to  TRUE,  even  Integer  to 
FALSE. 

4 . 5  Formula 

A  formula  represents  a  value.  It  can  ba  an 
lntager  formula  or  a  floating  formula,  rapresant- 
lng  either  an  lntagar  or  a  floating-point  number, 
respectively.  An  lntager  formula  la  a  positive 
or  a  ragatlva  lntager  term,  which  can  be  added  to 
subtracted  fro*  a  succeeding  lntagar  tan.  This 
intermediate  result  can  than  be  added  (or  sub¬ 
tracted)  to  another  lntager  ten,  and  so  on. 

(Tha  arithmetic  operators  are  left  associative.) 

An  lntager  ten  is  an  Integer  factor,  which  can  be 
multiplied  or  divided  by  succeeding  integer  fac¬ 
tors  (as  with  terns).  An  lntager  factor  can  be 
an  Integer  literal,  a  variable,  or  an  Integer 
formula  enclosed  by  a  pair  of  par ant bases. 

Floating  formulae  are  similar  to  Integer 
formulas,  axcept  a  factor  must  be  of  floating 
type.  In  addition,  a  floating  factor  nay  be  a 
call  of  function  FLOAT,  which  converts  an  integer 
formula's  value  to  floating  for*.  The  ayntax  for 
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(a)  Actual  Input  Faramaters 

Tha  actual  Input  paraaetere  can  be  0  or 
more  formulae.  Each  formula  la  evaluated;  and 
its  value,  type,  structure  and  parameter  type  are 
pushed  onto  AF-8TACK. 

(b)  Actual  Output  Paraaetere 

Tha  actual  output  paraaeters  can  be  1  or 
■ora  vrrlablas .  Since  each  of  the  actual  output 
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parameters  that  aren't  tablaa  aat  returned  a  naw 
value,  thalr  DH-locns  ara  alao  aavad  In  tha  AP- 
STACK.  Output  parameters  being  traced  alao  have 
thalr  names  placed  In  tha  Mama  field. 

,  (c)  Parameter  Matching 

Corresponding  actual  and  formal  parameters 
must  agree  in  type,  structure,  else,  and  input/ 
output  type. 

5.  Lexical  Processor 

The  J73  program  is  a  string  of  characters. 
The  lexical  processor  LP  scans  tha  characters  in 
the  source  program,  checks  thalr  legality,  and 
assembles  them  into  tokens.  Tha  tokens  can  be 
reserved  words  such  as  "ITEM"  and  "IF",  oparatora 
such  as  "+"  and  names,  or  numbers.  Tha 
lexical  processor  together  with  tha  associativa 
memory  SAM  also  handles  define  declarations  and 
define  culls,  and  comments.  It  also  handles  the 
directive. 

The  structure  of  the  lexical  processor  LP 
consists  of  an  associativa  memory,  and  raglsters 
as  shown  in  Fig.  5.  They  are  described  below. 

(s)  Program  Memory  PM  contains  tha  text  of  the 
JOVIAL  program  to  be  executed.  It  is  arranged 
as  one  long  string  of  characters.  Each 
character  is  assigned  an  ordinal  position  so  it 
can  be  identified  by  ragiatar  SPIR. 

(b)  Scanner  Associative  Memory  SAM  stores 
information  about  define  declarations.  For  each 
valid  define  declaration,  an  antry  is  created 

to  store  the  name  of  tha  declaration,  the  location 
of  the  first  character  of  the  define  body  and  the 
first  after  the  last  character  of  tha  define  body. 

(c)  Table  LECALCHAR  contains  valid  characters 

of  the  JOVIAL  syntax  and  their  respective  classes. 

(d)  Table  RESERHORD  contains  reserved  words  and 
their  type.  Special  reserved  words  ere  'DEFINE* 
(type  'D')  and  'FLOAT'  (type  'FLOAT');  the  others 
ure  type  ' R ' . 

(e)  Register  CHAR  holds  tha  last  character 
fetched  from'Program  Memory. 

(f)  Register  CLAS8  holds  the  class  of  the 
character  stored  In  register  CHAR.  The  class,  an 
interger,  is  found  by  searching  the  LEGAL-CHAR 
table. 

(g)  Stack  RETURN  saves  the  SPTR  position 
Immediately  following  a  define  call  so  that, 
after  SPTR  has  advanced  over  the  define  body,  it 
Is  reset  to  the  proper  position  to  continue 
progrnm  execution. 

(h)  Stack  DEF-END  saves  the  end-ptr  positions 
of  the  bodies  of  active  define  calls.  When  SPTR 
reaches  the  position  pointed  to  by  the  top  entry 
of  the  stack,  that  define  call  is  completed  and 
a  return  )  "■  performed  by  popping  DEF-END  and 


popping  return  into  SPTR.  Recursive  define  calls 
ara  not  allowed. 

(1)  Thera  ara  two  tables:  LECAL-CHAR  and  RESER- 
WORD.  It  naada  to  chmck  each  character  of  the 
aourca  program  to  detarmlnm  whether  it  la  legal 
by  looking  up  table  LEGALCHAR.  It  naada  to 
determine  whether  the  new  token  is  a  raaerved 
word  by  looking  up  table  RESERWORD.  The  legal 
character  table  la  shown  in  Table  2;  there  are 
56  legal  characters  in  10  clssses.  The  reserved 
word  table  is  shown  in  Table  3;  there  are  19 
reserved  words. 

Tha  lexical  processor  LP  scans  the  source 
string  of  characters,  checks  their  legality,  and 
assembles  them  into  tokens.  It  is  activated 
by  either  CP  or  DP.  The  lexical  constructs  are: 

(1)  token 

(2)  character 

(3)  name 

(4)  number 

(5)  operator 

(6)  define  and  comment 

The  hardware  sequences  of  the  LP  which  have 
sequence  NEXT-TOKEN  as  the  root  sequence  consists 
of : 

01  NEXT-TOKEN 
02  NEXT-CHAR 
02  NAME 

02  DIRECTIVE-NAME 
02  DEF1NE-DECL 
02  DEFINE-CALL 
02  REL-OP 

02  NUMBERICAL-LITERAL 
03  EXPONENT 
03  FRACTION 
02  COMMENT 

5 . 1  TOKEN 

Token  is  the  lexical  element  of  a  source 
program.  It  can  be  a  name,  e  number,  an  operator, 
or  a  separator  as  shown  in  the  suntax  below. 

32.  <next-tok«n>  <name> 

I  <numeric~llteral> 

!  <operator-saparator> 

!  <reaerv#d-word> 

38,  <operator-separator> 

(!)!:!;!,!■!  + 

!  -  1  *  !  /  I  "  1  '  !  .  1  <> 

1  <!>!<«!>"!<  I  >  !  : 
!;!,!"!'!..!!  Iblsnk 

39.  <reservsd-word>  : t-  START I PROGRAM! TERM 

I  BEGIN  I END I ITEM I 
TABLE I  REF 

I  PROCI F0RIBV I WHILE! 
IF!  ELSE 

I  EISlUlF 

Sequence  NEXT-TOKEN  is  designed  to  assemble 
the  adjacent  characters  In  the  source  program  into 
a  token.  It  extracts  the  next  logical  group  of 
characters  (the  next  token)  from  PM.  The  token 
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nay  be  a  reserved  word,  identifier,  numeric-liter¬ 
al,  operator,  eeparator  or  directive.  This 
sequence  also  handles  def ine-declarationa  (macro 
definitions)  and  def ine-calla ,  because  they  affect 
the  control  flow  of  program  text. 

Initially,  the  starting  position  of  the  token 
is  stored  in  register  BACK-PTR.  Then  the  token  is 
formal.  If  the  token  is  the  reserved  word  ‘DEFINE1 
a  def ine-declarat ion  is  processed;  if  it  is  an 
identifier  with  an  entry  in  the  SAM  a  deflne-call 
la  processed.  When  the  next  token  to  ba  passed  to 
the  other  processors  has  been  formed,  the  'noise* 
following  it  is  skipped  over.  Noise  consista  of 
blanks,  illegal  characters  and  comments.  Upon 
return,  the  token  will  be  in  register  TOKEN,  its 
type  will  be  in  register  TYPE,  and  register  SPTR 
will  be  pointing  to  the  beginning  of  the  next  token 
to  be  formed. 

Sequence  NEXT-TOKEN  fetches  the  next  char  from 
the  source  program  and  then  acta  according  to  the 
class  number  of  the  character  as  follows. 

class  1:  An  illegal  char.  Call  ERROR. 

class  2:  A  blank.  Skip  the  blank. 

class  3;  A  letter.  The  succeeding  characters 
are  assembled  into  a  name.  The  name 
can  be  reaerved  word,  an  UP  command, 
or  an  operand  name. 

class  4;  A  digit  or  period.  The  suceeding 
Characters  are  asaembled  into  a 
number . 

class  S:  A  decimal  point.  This  case  is  hand¬ 
led  the  same  as  class  3. 

class  6;  Unary  operator  '+'  or  It  is 

stored  in  register  TOKEN. 

class  7:  An  operator.  It  is  stored  in  regi¬ 
ster  TOKEN. 

class  8s  A  '<*  or  *>'.  A  two-character 

operator  ('<*',  or  '<>')  is 

asaembled. 

clses  9s  A  *1*.  A  directive  name  is  assem¬ 
bled  and  identified. 

class  10:  A  double-quote.  A  comment  is  flush¬ 
ed  out . 

3.2  Character 

A  character  cen  be  a  letter,  a  digit,  or  a 
mark.  There  are  10  digits,  26  letters,  and  17 
mark*,  as  shown  below. 

36.  <cliaracr.er>  ::■  <letter-- 
I  <diglt> 

I  <mark> 

43.  <diglt>  ::-  0  I  1  I  2  I  3  I  4  I  5  I  6  ! 

17  18  19 

44.  <letter>  AtBI CIDIEI FIG! 

1HI II J IK! LIMINlO 
IPIQf RIStTtUI V(U 
IXIYI2 

43.  <mark>  ::•+!-!  *1 /!- I 

I .  I ;  I ,  I ;  I  (  ! )  I  * 

I . !  I  ! blank 


Sequence  NEXT-CHAR  fetches  the  next  charac¬ 
ter  from  the  source  program  In  program  mamory. 

The  next  character  Is  polntsd  to  by  rsglstsr  SPTR 
and  becomes  available  in  register  CHAR.  A  test 
must  now  bs  made  to  determine  If  SRTR  points  to 
next  to  the  end  of  a  define  body  by  Comparing  it 
to  register  DEP-END.  If  it  doss,  SPTR  is -given 
the  value  of  register  RETURN  (l.e.  to  return  from 
the  define  call)  before  the  next  character  is 
made  available.  The  character  is  then  tested  for 
legality  snd  register  CLASS  is  set  to  the  class 
number  of  the  character. 

3.4  Numeric-literal 

A  numerical-literal  is  a  positive  integer, 
and  a  floating  literal  is  a  numerical-literal 
with  a  decimal  point.  The  lexical  rules  for 
numerical-literals  and  floating-literals  ars 
shown  below. 

34.  <numeric-llterel>  "•<integer-lit«rel>. 

!  <f lostlng-lltersl> 

33.  *■  lnteger-llteral>-»<digit>*  *  • 

36.  <TloatinR-literal>"»<digit>“  • 

<exponent> 

!  l<dlgit>)*<digit>.* 
|<#xponent>] 

37,  <exponent>  .'.‘■E[+!-]<integer- 

<litsral> 

Sequence  NUMERICAL-LITERAL  needs  to  detect 
the  sequential  combinations  of  digit,  period, 

'E',  and  others.  There  are  3  sequences 

as  shown  below. 

01  NUMERICAL-LITERAL 
02  EXPONENT 
02  FRACTION 

Sequence  NUMERICAL-LITERAL  constructs 
numerical-literals.  Thera  are  two  types:  integer 
(type  *1')  snd  floating  (type  'FL').  Floating- 
literals  have  a  decimal  point  and/or  an  exponent; 
Integer-literals  have  nalther.  Sequence  FRACTION 
extracts  the  digits  following  the  decimal  point 
of  a  floating-literal,  while  sequence  EXPONENT 
extracts  the  exponent  part  of  a  floating-literal. 

3.3  Relational  Operators 

The  operators  of  machine  e  consist  of  single 
snd  double-character  operators  snd  the  reserved 
words.  Sequence  R2L-OP  extracts  the  relational 
operators  *>’,  'o',  or 

3.6  Comment 

The  comment  is  s  string  of  0  or  more 
characters  enclosed  by  a  pair  of  quotas.  The  syn¬ 
tax  is  shown  below. 

35.  <commsnt>"»"(<chsractsr>)" 
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Sequence  COMMENT  flushes  out  the  string  of  charac¬ 
ters. 

5.7  Define  (Fig.  27) 

The  deflne-declaration  is  a  macro  definition; 
ita  body,  like  a  comment,  is  a  string  of  0  or 
more  characters.  The  syntax  is  shown  below. 

43.  <def  ine-declaratlon>;;«DEKlNE^name> 
"(<character>. . . ]" 

46.  <def lne-call>-*<name> 

Sequence  DEFINE-DECL  processes  a  define  - 
declaratlon.  The  d«f ins-name  cannot  be  the  same 
as  any  name  declared  in  the  same  module  or  any  pro¬ 
cedure  name.  A  SAM  entry  ia  created  to  hold  the 
name,  module-id,  location  of  the  first  character 
of  the  define-body,  and  location  of  the  double- 
quote  (')  which  signals  the  end  of  the  dfefine- 
doby  for  each  valid  declaration.  The  define-body  . 
is  enclosed  in  double-quotes,  so  no  comments  are 
allowed  between  the  deflne-name  and  deflne-hody. 

Sequence  DEFINE-CALL  processes  a  define-call. 
A  define-call  is  not  allowed  when  the  name  of  a 
declaration  or  a  procedure  is  the  next  token  ex¬ 
pected.  On  a  valid  call,  the  return  location  is 
saved  by  pushing  it  onto  stack  RETURN,  register 
SPTR  is  assigned  to  point  to  the  first  character 
of  the  define-body  and  the  end  position  of  the 
define-body  is  pushed  onto  stack  DEF-END, 

The  top  of  RETURN  identifies  the  'define- 
uctivatlon  level'  of  the  source  program.  This 
level  needs  to  be  known  by  the  CP  to  determine  if 
control  statements  may  have  CAM  entries,  so  It  la 
always  stored  in  register  D-LEV, 

6.  Concluding  Remarks 

The  above  JOVIAL  Direct-Execution  Machine  A 
directly  reflects  the  language  constructs  of  the 
J73  language.  The  lexical  processor  directly  re¬ 
cognizes  the  legal  charactere,  reserved  words, 
operands,  operators.  It  assembles  token,  and  exe¬ 
cutes  lexical  "ccesaand s"  (such  as  the  DEFINE 
'  constructs  of  the  J73  language.  The  control  pro¬ 
cessor  directly  executes  the  control  statements 
and  sequences  the  order  of  execution  of  the  as¬ 
signment  statements;  this  control  processor 
organization  reflects  the  control  constructs  of 
the  J73  language.  The  data  processor  dirqptly  re¬ 
ferences  symbolic  names  and  executes  data  opera¬ 
tions;  this  date  proceseor  organization  reflects 
the  data  constructs  of  the  J73  language. 

The  above  JOVIAL  Machine  A  is  a  multipro¬ 
cessor  system;  each  processor  performing  a 
function  reflecting  language  constructs.  If  the 
lexical  processor  were  operated  In  a  parallel  but 
.  synchronized  manner  with  the  control  processor  and 
:  data  processor,  the  repeated  lexical  processing  in 
a  program  loop  would  not  impede  the  execution 
speed.  By  using  the  information  of  the  control 
structure  of  the  source  program  In  the  aesoclatlve 
ri memory  CAM,  there  need  be  no  repeated  syntactical 


processing  of  the  control  statements  In  a  program 
loop. 

The  Idea  of  a  direct-execution  machine  Is 
simple,  but  its  structure  can  be  highly  complex 
If  the  programming  language  such  as  JOVIAL  is 
complex.  Thus,  there  are  two  issues:  the  issue 
of  the  programming  language  and  the  issue  of  the 
computer  architecture  for  the  programming  langu¬ 
age.  Criticisms  on  a  particular  direct-execution 
machine  should  address  clearly  the  whether  It  is 
the  language  issue  or  it  is  to  the  architecture 
issue. 
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Abstract 

Kaaaarchara  have  raallsad  that  von  Neumann 
Machines  do  not  adequately  provide  for  the  con¬ 
structs  that  occur  in  c reason  prograaning  languages. 
Most  of  these  shortcanlnga  are  attributable  to  a 
phenonenon  known  aa  a  (auntie  gap.  Over  tha  past 
decade  there  has  been  increased  interest  in  building 
Machines  that  hava  ana  Her  semantic  gap.  It  can  ba 
conjectured  that  there  axiata  an  'ideal'  directly 
executable  language  (OIL)  which  describes  an  archi¬ 
tecture  with  a  smaller  asauntlc  gap  than  conven¬ 
tional  Machines.  The  proof  of  this  conjecture  will 
enable  us  to  evaluate  candidate  Machine  instructions 
and  to  select  the  Most  suitable  Machine  language 
for  a  given  computing  environment.  In  order  to 
prove  thla  conjecture,  certain  characteristics  of 
Machines  like  tha  level  of  a  Machine  with  respect 
to  a  high-level  language  must  be  quantified. 
Halstead's  Software  Science  metrics  are  used  for 
this  purpose. 


Introduction 

Before  we  start  our  Introduction,  we  would 
like  to  define  precisely  the  meaning  of  the  term 
architecture  as  used  in  this  paper.  Computer 
architecture  la  tha  virtual  machine  aa  viewed  by  a 
machine  language  progranner.  This  is  the  view  held 
bv  Flynn  (75).  Thus,  changing  machine  la  nguage 
(assembler  language)  change#  the  architecture. 

Using  the  same  argument,  all  models  of  IBM/370  have 
tha  tame  archltacture. 

Researcher*  have  realized  that  von  Neumann 
machines  do  not  adequately  provide  for  the  con- 
atructn  that  occur  in  common  programming  languages. 
Moat  of  these  shortcomings  arc  attributable  to  a 
phenomenon  known  as  semantic  gar  (Gagllardt  (731). 
Tha  semantic  gap  la  a  measure  of  the  difference 
between  the  coucepts  In  high-level  languages  and 
tha  concepta  In  coaputer  archltacture.  Mott  current 
syetami  hive  an  undesirably  large  semantic  gap  In 
that  the  objects  end  operations  reflected  in  their 
architecture  are  raraly  cloaaly  related  to  the 
objects  and  oparatlons  provided  in  prograaning 
languages.  As  ehown  by  Myers  (78],  thie  large 
eeauntlc  gap  contributes  to  software  unreliability, 
performance  problems,  exceeelve  program  size,  com¬ 
piler  complexity  end  diatortlons  of  the  programming 
languages,  ell  of  which  contribute  negatively  to 
the  economics  of  data  processing. 


The  semantic  gap  can  ba  raducad  by  construct¬ 
ing  a  high-level  language  machine  for  each  language 
Such  high-level  language  aa chinas  hava  many  advan¬ 
tages  (Tanaaabaum  [76]).  Ov«r  tha  peat  decade, 
there  has  bean  increased  Interest  In  building 
Machines  that  hava  anallar  semantic  gap.  These  i 
attempts  art  surveyed  In  Carlson  [73]  sad  Myers  { 
[78].  Tha  proposed  designs  fall  into  3  categories] 

1.  'Truly'  high-level  language  procaaaore. 

2.  'Pseudo'  high-level  language  proceaaora. 

3.  Interned lets  language  processor*. 

In  'truly'  high- level  language  proceaaora, 

(e.g.  Bio  on  [73])  the  processor  accepts  a  prograa 
string  written  in  a  high-level  language  and  per¬ 
form*  operations 'aa  determined  by  the  sanantlca  of 
ths  program  string.  The  Important  character iatic 
of  this  dsslgn  is  that  tha  architecture  oparataa  oh 
the  program  dlrsctly.  A  llttla  thought  will  con¬ 
vince  the  reader  that  such  a  design  is  not  tha 
ideal  alternative  to  von  Neumann  architecture*  from 
either  the  memory  alee  standpoint  or  interpretation 
time  standpoint  (Hosval  [74]). 

In  'pseudo'  high-level  language  proceasors, 
(e.g.  Burkl*  at. el.  [78])  the  source  program  is 
preprocesssd;  the  software  preprocessor  performs  a 
lexical  transformation  on  the  input  changing  th* 
keywords  and  operators  into  internal  cod*.  All 
data  objects  In  ths  prograa  sr*  rsplscad  by  refer¬ 
ences  to  memory  locations.  With  the  exception  of 
superfluous  blanks,  preprocsssing  is  en  isomorphism. 

The  two  high-level  language  proceseor  designs 
described  above  are  highly  source  language  depen¬ 
dent  and  so  a  machine  should  be  constructed  for 
each  high-level  language.  In  the  cats  of  inter¬ 
mediate  language  processors,  the  source  prograa  1* 
converted  into  a  prograa  In  an  intermediate  language. 
The  resulting  surrogate  program  Is  executed  by  th* 
architecture.  It  has  been  established  (Wad*  and 
Schneider  [73]  and  Lancaster  [72], [76])  that  a 
certain  set  of  semantic  primitives  can  adequately 
express  the  major  portion  of  ths  seswntics  of 
programs  wvittsn  in  any  of  th*  several  coomon  high- 
level  language*.  Therefor*,  it  is  coujectured 
(Wade  and  Schnsldsr  [73])  that  by  designing  *  com¬ 
puter  organization  which  implements  s  set  of  amnes¬ 
tic  primitives  that  describe  common  high  level 
constructs,  one  instruction  per  primitive,  speed 
Increases  approaching  that  of  a  'truly'  high-level 
language  processor  can  be  achieved  while  retaining 
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Liu;  flexibility  characteristic  of  software  dominated 
conventional  machines. 

The  authors  believe  that  the  intermediate 
language  processor  is  the  desirable  choice.  The 
authors  ulao  believe  that  there  exists  a  direct  re¬ 
lationship  between  the  level  of  a  target  machine 
with  respect  to  the  source  language  (cf.  SECTION  2) 
and  the.  machine's  dependence  on  the  source  language. 
That  is  to  say,  the  higher  the  level  of  s  machine 
with  respect  to  a  language,  the  more  language  de¬ 
pendent  will  the  machine  be. 

Because  of  thia  relationship  one  can  measure 
the  closeness  of  a  language  to  the  machine.  It  can 
be  conjectured  that  there  exists  an  'ideal*  directly 
executable  language  (DEL)  which  describes  an  archi¬ 
tecture  with  a  smaller  eemantlc  gap  than  conven¬ 
tional  machines.  Hoevel  [74]  gave  an  analytical 
argument  to  show  tha  existence  of  an  'ideal' 
directly  executable  language  which  performs  better 
than  conventional  machines.  In  order  to  prove  the 
above  conjecture  we  muat  quantify  certain  character¬ 
istics  of  machines  like  the  level  of  e  machine  with 
respect  to  a  source  language  and  semantic  gap.  This 
Is  the  topic  af  present  research.  The  metrics  de¬ 
fined  in  this  work  are  based  on  Haletead'e  (Halstead 
(77j)  Software  Science.  This  research  la  a  step  In 
the  direction  of  quantifying  architectures  and  la 
an  attempt  to  bridge  the  gap  between  language  de¬ 
signers  and  computer  architects.  The  metrics 
defined  can  be  used  either  to  evaluate  candidate 
intermediate  languages  and  aalact  th  t  moat  suitable 
muchlne  language  for  a  given  computing  environment 
or  to  evaluate  existing  machines  for  a  given 
environment. 

Hoevel  [741  has  arguad  that  neither  machine 
language  of  conventional  machlnaa  nor  aource  lan¬ 
guage  la  an  'Ideal'  DEL  either  from  Interpretation 
standpoint  or  from  storage  point  of  view.  He  con¬ 
tends  that  an  'Ideal'  DEL  for  a  contemporary  comput¬ 
ing  system  lies  somewhere  between  Its  source  lan¬ 
guage  and  the  language  accepted  by  lte  base  machine, 
in  thia  research,  we  attempt  to  prove  that  an 
'ideal'  DEL  from  semantic  gap  standpoint  also  lies 
somewhere  between  the  source  language  am)  machine 
language.  In  the  next  section,  software  metrics 
that  will  be  used  to  quantify  architectures  are 
defined.  Results  obtained  so  far  are  Included. 

Section  2 

Halateed  end  hie  students  (Halstead  [73], (77) 
and  Software  Engineering  [79])  found  that  applica¬ 
tion  of  the  classical  methods  of  natural  sciences 
demonstrate  that  even  such  Intangible  objects  ea 
written  ebetrecte  and  computer  programs  ere  governed 
by  natural  lews.  Bone  of  the  netrlce  used  by  then 
that  ere  pertinent  to  present  work  ere  now  presented 
without  explanation.  Interested  readers  should 
refer  to  Halstead  [77]  for  details. 

1.  The  Volume  Vi  A  suitable  netrlc  for  th*  sire 
of  any  Implementation  of  an  algorithm,  celled  the 
volume  V,  can  be  defined  ee 

V  -  N  log 2  n  (I) 


where  N  is  its  length  and  n  is  the  size  of  its 
vocabulary. 

2.  The  Potential  Volume  V*;  The  most  succinct 
form  in  which  an  algorithm  could  ever  be  expressed 
would  require  the  prior  existence  of  a  language  in 
which  the  required  operation  wae  already  defined 
or  Implemented,  perhaps,  as  a  subroutine  or  a 
procedure.  The  potential  volume  of  en  algorithm 
is  the  volume  of  the  program  which  expresses  the 
algorithm  in  its  most  succinct  form'. 

V*  -  (2  +  up  log2  (2  +  n$)  (2) 

wheru  nj  is  the  number  of  unique  operands. 

3.  The  level  of  a  Program  L:  Since  there  can  be 
more  than  one  possible  lmplm.ientatlon  of  an  algo¬ 
rithm,  it  is  necessary  to  define  the  level  of  a 
program.  The  level  of  a  program  L  is  defined  as 

L  -  V*/V  (3) 

4.  Tha  Leval  ct  a  Language  A;  When  different 
algorithm*  are  programmed  In  a  given  implmentation 
language,  it  is  observed  (Halstead  [77])  that  as 
the  potential  voluma  V*  Increases  the  program 
level  1.  decreeaes  proportionately.  Consequently, 
the  product  L  times  V*  remains  constant  for  nny 
language.  This  product,  the  language  level,  is 
denoted  by  As 

A  -  L*  V*  (4) 

The  four  quantities  defined  above  form  the. 
basis  of  our  research.  In  order  to  discuss  the 
details  s  few  more  terms  must  be  introduced. 

5 •  Level  of  a  machine  with  respect  to  a 
Language  3^s  Certain  machines  ars  mors  closely 

related  to'  the  operations  and  data  structures  in  a 
high-level  language  than  other  machines.  A  measur¬ 
able  quantity  that  describes  this  characteristic  of 
a  machine  is  in  order,  The  level  of  a  machine  with 
M 

respect  to  a  language  3j  is  defined  as 


V  is  the  volume  of  an  algorithm  implementation  in 
L 

the  language  L  and  VH  Is  the  volume  in  the  machine 
language  of  the  machine  M. 

Remarks i  1.  The  author*  strongly  believe  that  tha 
quantity  In  equation  (5)  Is  a  constant  for  a  given 
machine  M  end  e  language  L  (and  a  compiler)  end 
does  not  very  significantly  with  either  algorithms 
or  programming  styles. 

2.  Compiler  overhead  ie  Included  while  measuring 
volume  Vg  In  equation  (3).  Thus,  is  tha  voluma 

of  the  program  translated  Into  machine  lenguege  M 
by  a  compiler  starting  with  the  program  in  the  high- 
level  language  I..  This  approach  1b  used  fov  prac¬ 
tical  reasons . 


I 


3.  If  coapllar  overhead  la  to  ba  excluded,  a  dif- 
farant  metric,  tha  Potantial  Laval  of  a  Machlna 

>L  aay  ba  uaad: 


»L  "  VAL  <6> 

where  A^  la  tha  laval  of  tha  machine  language  of 
machine  M  and  A^  la  tha  laval  of  tha  high-level 

1 snguage  L.  Potantial  laval  can  ba  graatar  than  1 
alnca  it  la  poaalbla  to  hava  a  sachlna  languaga 
who a a  laval  la  hlghar  than  that  of  a  hlgh-laval 
languaga.  Tha  perfoiaance  of  a  coapllar  can  ha 
avaluatad  ualag  tha  two  lavala  daflnad  above. 

Soaa  Results:  32  TORTRAN  ptograaa  written  by  grad- 
uata  and  fraahaan  computer  aclanca  atudanta  at  SMU 
ara  uaad  In  our  validation  of  aquation  (5). 
Oparatora  and  Operands  In  tha  prograna  uaad  ara 
countad  according  to  tha  rulaa  auggaatad  by  Bulut 
[74] .(73).  Tha  raaulta  ara  glvan  In  Table  1.  Whan 
thaaa  valuer  ara  plottad  (fig.  1),  a  etralght  llna 
relation  batvaan  the  two  voluaea  with  a  correlation 
coefficient  of  0.97B  la  obaervad.  Proa  tha  plot, 
the  laval  of  Coapaaa  (aaaaablar  languaga  of  Cyber) 
with  raapact  to  TORTRAN  (ualng  PTM  coapllar  with 
OPT  ■  0)  la  given  by  tha  alope  of  the  curve: 

}Ccapaaa  „  0-1716,86 


Similar  coaputatlona  ara  parforaad  on  COBOL  and  tha 
raaulta  ara  tabulated  in  Table  2.  Tha  laval  of 
Coapaaa  with  raapact  to  COBOL  la  calculated  to  be 
(Pig.  2) 


language  and  dT  la  tha  execution  time  of  tha 
prograa  for  tha  dynamic  volume  V^. 

Reaarke !  1.  To  evaluate  aquation  (7),  a  prograa 

(or  a  aet  of  prograna)  auat  ba  executed  with  dif¬ 
ferent  aata  of  data.  Tor  each  aat  of  data,  tha 
dynamic  volume  v.  and  tha  execution  tlae  dT  auat  ba 
noted.  Than,  the  Integration  la  aquation  (7)  can 
ba  approximated  by  auaaation. 

II 

2 .  Tha  product  la  a  naaaura  of  tha  apaed 

at  which  prograaa  written  in  a  hlgh-laval  languaga L 
are  executed  on  machine  M. 


Soaa  Raaulta i  A  a lap la  prograa  la  run  on  Cyber  72 
a  number  of time a  with  varloua  vnluaa  for  input 
data.  Tha  valuea  of  execution  time  for  varloua 
dynamic  voluaea  ara  plottad  In  Tig.  3.  Aa  can  ba 
aaan  from  the  graph,  tha  rata  at  which  Cyber  pro- 
caaaea  Information  la  fairly  constant  and  la  glvan 
by  tha  alope  of  tha  graph. 

^yber  72  “  25.746  *  10®  blt./.ac 


plicatlona 


Although  tha  raaulta  obtained  ao  far  ara  not 
enough  to  clala  tha  validity  of  our  aatrlca.  they 
tend  to  aupport  our  intuition.  However,  alnca 
Intuition  la  far  from  truatworthy,  wa  ara  planning 
to  collect  data  for  three  languagea  TORTRAN,  Pascal, 
and  COBOL  and  on  throe  architacturaa  Cyber,  AMDAHL, 
and  TI  9900.  Ha  believe  that  thla  aat  la  a  repre- 
aantatlve  clean  of  languagea  and  machine!  moat 
commonly  uaad. 


Coapaaa  _  0S17147  Once  tha  conalatancy  of  thaaa  aatrlca  haa  been 

COBOL  validated,  they  can  ba  uaad  to  aalact  a  aachlna 

languaga  that  la  beat  aulted  for  a  computing  envi- 

6.  Dynamic  Voluae  Vg:  Tha  volume  of  an  algorithm  ronaant.  Denoting  the  aat  of  prograamlng  languagea 

daflnad  In  (1)  la  a  atatlc  aaaaura  of  tha  alaa  of  coneidaration  by  P,  tha  machlna  language  for 

tha  algorithm  and  it  can  ba  uaad  aa  an  aetlaata  of  which  tha  quantity 

the  mearary  required.  However,  tha  actual  amount  of  H 

coda  procaaaad  by  tha  computer  la  different  for  £  (k  >  a")  (8) 

different  aata  of  data.  Depending  on  tha  Input,  LtP  "  11 

certain  aegmanta  of  tha  prograa  may  be  executed 

more  often  than  other  aagaenta.  Tha  Dynamic  Voluae  la  maximum  describee  an  architecture  with  a  minimum 

of  a  prograa  la  tha  coda  of  tha  program  that  la  semantic  gap  for  tha  aat  of  prograamlng  languagea  P. 

actually  procaaaad  for  a  glvan  aat  of  data.  Tha  conetant  k,  in  aquation  (8)  la  a  weighting 

\  factor  that  reflects  tha  frequency  of  uaaga  of 

7.  Average  Information  Rata  1^:  Since  it  la  poa-  Language  L  in  a  particular  environment.  Typically, 

albla  to  conceive  of  two  machines  with  tha  aaae  if  of  th*  C®B0L  1,,u**d.1"  ■  «lv«  «avlron- 

archltacture  whore  one  machlna  executes  prograas  mant,  will  taka  a  value  of  0.9. 

faatar  than  tha  other  (a.g.  the  varloua  modela  of 

IBH/370  aarlaa),  a  aaaaura  of  the  proceaelng  apaad  Equation  (8)  can  alao  be  uaad  to  evaluate 

of  machines  auat  ba  defined.  Tha  average  lnforma-  axleting  architacturaa  for  a  glvan  environment, 

tlon  rata  1^  of  a  aachlna  M  la  auch  a  quantity  and  Uae  of  the  aatrlca  daflnad  In  thla  paper  provldaa 

la  glvan  by  useful  information  on  the  baalc  architecture  of  tha 


where  T  la  a  sufficiently  long  time  period  over 
which  tha  behavior  of  tha  program  la  observed,  V. 

a 

la  the  dynamic  volume  of  the  prograa  in  tha  machine 


machine  and  tha  Implementation  details  auch  aa  tha 
Information  proceaelng  rate  are  separated  from  tha 
architecture.  This  Information  la  not  provided  by 
benchmarks  which  reflect  only  the  apaad  of  execution 
of  tha  benchmark  programs  on  tha  machine.  However, 
tha  authors  ballave  that  tha  counting  techniques 
auggaatad  by  Bulut  [74], [73]  must  ba  refined  before 
existing  architectures  can  be  compared  ualng  our  metrlca. 


/ 


I 
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Observations 

While  compiling  FORTRAN  prof  mi,  we  cried 
various  optimisations  that  ere  eveilabl*  on  FTN  com- 
pller.  After  looking  et  the  tode  generated,  we 
decided  to  uae  only  the  code  generated  ueing  FTN 
compiler  with  no  optlaisetlon.  The  reeeon  for  thla 
is  thn  fact  that  optimisation  la  not  linear;  only 
certain  portlona  of  the  program  are  optimised.  For 
example,  no  attempt  la  made  to  reduce  the  coda  re- 
quired  to  implement  subroutine  calls  and  pasting 
of  parameters.  Thus,  if  a  program' has  a  large 
number  of  subroutine  calls,  the  amounts  of  coda 
generated  by  both  optimising  compiler  and  regular 
compiler  are  almost  the  same.  This  nonlinearity 
leads  to  an  unfair  comparison  of  FORTRAN  programs. 

Us  also  observed  that  on  Cyber  72,  there  era  a 
law  system  Macros  to  execute  most  commonly  occurring 
FORTRAN  functions  like  format  conversions  for  MAD 
and  WRITE  statements.  Similar  observation  can  be 
made  in  connection  with  COBOL  programs.  So,  tha 
authors  would  like  to  straaa  the  fact  that  the 
numb or n  obtained  era  for  a  virtual  machine  as 
viewed  by  a  compiler  writer.  However,  the  use  of 
Such  macros  atrengthene  our  belief  that  a  now 
Machine  language  which  has  a  higher-level  than 
Conventional  machine  language  is  needed  to  improve 
the  performance. 

Conclusion 

In  this  paper,  tha  authors  have  attempted  to 
ntroduce  the  subject  of  their  research.  Tha 
lithors  started  out  with  an  assumption  that  there 
cists  an  'ideal*  machine  language  which  has  most 
tha  advantagae  of  high-level  language  processors 
Is  retelulng  the  flexibility  of  conventional 
Neumann  machines.  In  order  to  prove  this  con- 

_  re,  e  few  metrics  are  defined.  Using  these 

metrics,  a  most  suitable  machine  language  for  a 
givsn  computing  environment  can  ba  designed. 

Although,  tha  actual  values  of  our  matrlca  me y 
change  if  a  different  counting  technique  la  uaad, 
tha  conclusions  are  still  valid.  Tha  valuss  obtained 
must  be  used  only  to  conpare  two  languages  and  no 
significance  must  ba  attached  to  the  absolute  valuss. 

In  our  research,  one  basic  assumption  Is  that 
tha  language  in  which  a  program  is  written  la  tha 
bast  language  for  that  algorithm.  However,  wa  did 
not  sea  any  published  raaulta  claiming  tha  superi¬ 
ority  of  one  language  for  a  particular  application. 
Our  method  can  ba  extended  to  evaluate  various 
programming  languages  for  a  given  application.  In 
order  to  do  this,  one  has  to  write  e  number  of 
prograaa  (within  e  given  area  of  application)  in  e 
set  of  prograalog  languages  and  measure  volumes  of 
these  algorithms  In  the  different  languages.  Tbs 
high-larval  language  that  has  an  overall  minimum 
volume  for  the  set  of  programs  is  the  best  imple¬ 
mentation  language  for  the  area  of  application 
under  ceaelderatlon.  Once  again,  we  would  like  to 
caution  tha  reader  that  tha  counting  tachniquaa  may 
have  to  ba  refined  before  our  method  can  be  uaed 
for  the  suggested  applications. 


It  is  probsbly  too  aarly  to  outline  the 
machine  charsctarlstlce  that  causa  semantic  gap, 
but  wa  observed  that  direct  execution  of  a  few 
high-level  instructions  would  enhance  the  perfor¬ 
mance  of  computers  appreciably.  These  instructions 
are  very  similar  to  the  semantic  primitives  sug¬ 
gested  by  Lancaster  [72]. 
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1149.2961 
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8*8.8998 

519.9212 

3646.4257 

1193.0721 

6834.4131 

Tabla  1. 

Validation  of  Equation  (3) 

m  Coapllar  irlth  OK  -  0  la  uaod. 
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339.001500 

7949.2895 

245.969780 

6198.6132 

159.911340 

3831.2925 

286.620880 

5333.9861 

95.908275 

2028.3122 

Tabls  2. 

Validation  of  Equation  (5) 

COBOL  compiler  on  Cyber  72  1*  used. 

V„._,  is  the  volume  in  COBOL.  V„  is  ths  volume  in  Compass. 

COBOL  Compass 
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Figure  2.  Validation  of  Equation  15) 
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Tabla  3.  Validation  of  Equation  (7) 

Vj  is  tha  dynamic  voluaa  In  Conpaaa.  dT  la  tha  axacutlon  tlma  on  Cybar72. 
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Flgura  3.  Validation  of  Equation  (7) 

Vd «  10  8  along  X-axla.  dT  along  T-axla. 


(units:  Vj  in  bits,  dT  In  saconds) 
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A  DIRECTLY  EXECUTABLE  LANGUAGE  SUITABLE  , 
FORA  BIT  SLICE  MICROPROCESSOR  IMPLEMENTATION1 
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Abstract 

Directly  executed  languages  (DELs)  as  proposed 
by  I lynn  have  variable  sized  fields  for  both  opera¬ 
tors  and  operands.  For  efficient  implementation 
this  architecture  requires  access  to  main  memory  at 
the  bit  level,  and  also  requires  powerful  operations 
on  varialbe  sized  bit  fields  in  the  host  processor. 
For  a  hardware  architecture  based  on  bit  slicd 
processors  and  byte  addressable  memory  it  may  be 
more  advantageous  to  consider  a  byte  oriented  DEL. 
This  simplifies  the  memory  access  hardware  and  makes 
the  decoding  of  the  DEL  code  a  straight  forward  look 
up  procedure.  This  paper  reports  on  a  project  to 
build  a  Pascal  oriented  micro  processor  (POMP)  and 
compares  the  POMP  encoding  of  instructions  with 
those  of  the  DEL  code.  Initial  results  indicate 
that  POMP  code  is  less  than  fifty  percent  larger 
than  DEL  code  and  hence  will  be  preferable  when 
simplicity  of  interpretation  is  required. 


Introduction 

A  Pascal  oriented  micro  processor  is  being 
built  at  Trinity  College  Dublin  using  AMD  bit  slice 
processors.  It  will  be  used  for  research  into  the 
emulation  of  intermediate  forms  for  block  structured 
languages.  Pascal  will  be  the  initial  language 
considered  and  will  also  be  used  in  all  examples  in 
this  paper.  During  the  design  an  architecture  to 
efficiently  support  Flynn's  DELs  was  considered. 

It  would  have  required  bit  addressable  memory,  and 
operators  for  variable  sized  bit  fields.  Instead 
an  architecture  based  on  byte  sized  instructions 
was  chosen  to  give  easier  interpretation  and  a 
simpler  main  memory  Interface.  It  was  also  felt 
that  o  compiler  producing  byte  sized  instructions 
would  be  easier  to  construct  than  one  producing  DEL 
code.  The  only  disadvantage  is  the  loss  of  compact¬ 
ness  of  code.  This  paper  reports  on  Initial 
investigations  Into  the  comparison  of  the  two 
encodings  and  considers  the  tradoff  between  compact¬ 
ness  of  code  and  ease  of  interpretation. 


The  work  described  herein  was  supported  in  part 
by  the  Army  Research  Office  -  Durham  under  contract 
no.  DAAG29- 78-0205. 
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0 irec t]y_  Executable  Languages 

The  aim  of  DEL  code  [1,2,3]  is  to  provide  an 
ideal  architecture  for  any  high  level  language. 

The  encoding  is  not  optimum  In  the  Huffman  coding 
sense  but  Instead  a  compromise  between  compact 
coding  and  ease  of  interpretation.  "An  ideal 
representation  must  be  concise  in  its  coding  of 
identifiers  yet  not  so  concise  that  It  exacerbates 
interpretation"  [2  page  22].  In  this  architecture 
the  scope  of  an  identifier  in  a  procedure  Is  very 
important.  The  address  of  an  identifier  is  given 
as  the  address  (offset)  within  the  contour.  Hence, 
the  number  of  bits  required  to  hold  an  address  is 
given  by  log2  (V)  where  V  is  the  number  of  unique 

Identifiers  within  the  scope.  Operators  can  also 
be  encoded  in  this  manner  but  the  number  of  opera¬ 
tors  is  small  and  hence  a  fixed  encoding  may  be 
used  Instead.  The  DEL  code  instructions  mirror  the 
operations  in  the  high  level  language  giving  three 
address  type  Instructions.  When  the  stack  is 
required  in  expression  evaluation  then  all  loads 
and  stores  come  as  additions  to  the  main  operation 
being  performed.  In  a  sense  they  come  for  free. 

The  loads  and  stores  are  not  explicitly  given  in 
the  DEL  code  Instead  they  are  implicitly  applied  as 
part  of  other  operations.  Thirty  two  formats 
specify  all  the  different  forms  of  the  three  address 
instructions.  The  DEL  encoding  for  a  number  oi 
expressions  is  given  in  figure  1.  The  encoding 
contains  the  format,  operands  and  operations  fields. 
In  the  format  field  A,  B  and  C  represent  the  three 
operands  of  an  instruction  when  they  are  not  in 
the  stack.  S  represents  the  resulting  operand 
pushed  on  top  of  the  stack,  and  T  represents  the 
operand  on  top  of  the  stack;  which  may  he  popped 
if  required,  and  U  is  the  next  to  top  operand  on 
the  stack. 

In  examples  1  and  2  the  operation  is  performed 
without  the  use  of  the  stack.  In  examples  3  and  4 
the  stack  is  used  and  its  use  is  Indicated  by  the 
format  of  the  instruction.  In  all  the  examples 
there  is  one  DEL  code  instruction  for  each  operator 
In  the  high  level  language  expression.  Note  also 
that  the  load  and  store  stack  are  implicitly 
implied  by  the  format  and  combined  with  the  instruc¬ 
tion  operation.  One  memory  reference  is  saved  in 
example  4  where  the  identifier  K  appears  more  than 
once.  Conditional  statements  and  the  addressing  of 
arrays  can  also  be  accomplished  in  a  similar  manner, 
as  shown  in  figure  2. 
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The  If  statement  In  example  1  produces  a  DEL 
code  Instruction  to  test  the  condition  and  skip  If 
the  condition  Is  not  true,  and  another  Instruction 
to  evaluate  the  expression  K:-K  +  1,  which  Is 
executed,  If  the  condition  Is  true.  In  example  2 
the  array  address  calculation  Is  considered  as  a 
single  operation  along  with  the  assignment,  and 
In  example  3  It  Is  considered  as  a  single  operation 
along  with  loading  or  storing  from  the  stack. 

This  encoding  produces  compact  code,  anywhere 
from  three  to  eight  times  more  compact  than  that 
produced  by  compilers  for  traditional  machines. 


Pascal  Oriented  Microprocessor 


The  size  of  procedures  written  In  a  structured 
manner,  using  a  high  level  language  tend  to  be 
small  [4].  The  most  frequently  occurring  state¬ 
ment  Is  asslgnamnt,  followed  by  procedure  call.  If 
and  return.  Assignment  statements  tend  to  be  very 
simple  with  the  majority  having  only  one  or  two 
terms  on  the  right  hand  side.  The  majority  of 
procedures  have  a  small  number  of  formal  parameters 
and  a  small  number  of  local  scalar  variables. 

Hence,  the  addresses  of  local  variables  and  the 
most  frequently  occurring  global  variables  may  be 
compactly  encoded.  The  coding  of  procedure  calls 
must  also  be  carefully  considered. 


A  significant  compaction  of  code  can  be  gained 
from  the  fact  that  during  the  execution  of  any 
Pascal  statement  the  state  of  the  processor  1$ 
always  known  e.g.  Integer  or  real.  Between  state¬ 
ments  the  state  of  the  processor  returns  to  the 
null  state.  For  example  the  statements 

var  J,K  :  Integer  ;  A.B  :  real; 

0  K  +  TRUNC(A  +  B) 

produce  the  following  Instructions  for  a  stack 
machine.  The  state  of  the  processor  Is  also  given. 


Instructions  Processor  State 


»  Null 

LOAD  K 

*>  Integer 

LOAD  A 

■>  Real 

LOAD  B 

»>  Real 

ADO 

«>  Real 

TRC 

**>  Integer 

ADD 

«>  Integer 

STORE  0 

->  Null 

0  -  Null 

I 

1  -  Boolean 

2  -  ASCII  (Character) 

3  -  Address  (pointer) 

4  -  Bit  address  (for  packed  structures) 

5  -  Integer 

6  -  Real 

7  -  Set 

The  state  Is  contained  In  a  three  olt  field  In 
the  processor's  PSW.  Some  Instructions  are  state 
Independent,  e.g.  LOADS,  and  hence  the  opcode 
range  Is  divided  Into  a  state  dependent  and  a  state 
Independent  range.  Assuming  that  these  ranges  are 
equal  In  size  then  there  are  1152  potential  opcodes. 
For  this  architecture  the  low  end  of  the  opcode 
range  Is  for  state  dependent  Instructions.  The 
opcode  range  X'OO1  to  X'ZF1  hat  been  reserved  for  . 
zero  address  Instructions.  The  opcode  X'10' 
represents  Integer  addition  If  the  processor  state 
Is  Integer  and  real  addition  If  the  state  Is  real. 
The  null  and  boolean  states  are  used  for  uncondi¬ 
tional  Jumps  and  false  Jumps  respectively.  A  few 
lines  from  this  area  of  the  opcode  table  are  shown 
In  figure  3.  Each  opcode  represents  five  different 
operations  depending  on  the  processor  state. 

Branch  Instructions  are  Implemented  In  both  a  short 
and  long  form.  The  short  form  Is  given  In  this 
area  of  the  opcode  table  and  consists  of  two  bytes 
In  the  following  form. 


000 1. 

opcode  — 1  1 — -  4*  offset 

This  requires  thirty  two  opcodes  X'OO'  to  X'lF'. 

The  long  form  jump  consists  of  an  opcode  followed 
by  a  two  byte  offset. 

Load  Instructions,  which  are  state  Independent, 
are  used  to  load  the  stack  and  also  set  the 
processor's  state.  Eight  of  these  are  provided 
for  each  of  the  states:  boolean,  ASCII,  address. 
Integer,  real  and  set.  Three  bits  within  the  byte 
give  the  local  variable  number  and  the  format  Is 


+++++.. . 

opcode _ L  I 

local  variable  nuafcer 


The  processor  state  Is  null  between  each  state' 
ment  and  Is  set  by  load  Instructions  and,  In  this 
example,  by  the  truncate  Instruction  also. 

Advantage  can  be  taken  of  this  fact  [5]  to  provide 
a  two  dimensional  Instruction  set  thereby  greatly 
Increasing  the  range  of  opcodes  available.  Eight 
states  of  the  processor  are  used. 


The  long  form  of  these  Instructions  Is  used  If  the 
procedure  has  more  than  eight  local  variables  - 
this  will  occur  six  percent  of  the  time  [4]. 

Separate  one  byte  opcodes  are  used  to  perform 
operations  between  the  top  of  the  stack  and  eight 
local  variables.  If  a  local  variable  Is  added  to 
the  top  of  the  stack  this  requires  a  one  byte 
Instruction  rather  than  two  Instructions  In  the 
conventional  stack  machine.  These  Instructions  are 
two  dimensional  In  that  the  meaning  of  the  operation 
also  depends  on  the  processor  state.  The  operation; 


involved  are  add,  subtract,  multiply,  divide, 
compare  for  equality,  compare  for  Inequality  and 
store.  The  null  state  of  this  part  of  the  opcode 
table  cannot  be  used  with  these  operations  and 
hence  Is  used  to  zero  local  Integer,  Increment 
local  Integer,  and  decrement  local  Integer.  These 
Instructions  also  have  eight  different  opcodes  for 
eight  local  variables  and  again  they  replace  either 
three  or  two  Instructions  In  the  conventional 
stack  machine. 

Test  Results 

The  code  generator  (the  assembler)  of  the  P 
code  compiler  was  modified  In  order  to  obtain  a 
feel  for  the  compactness  of  the  WUP  code.  The 
modified  compiler  produces  either  P  code  [6]  or  a 
''Combination  of  P  code  and  POMP  code  depending  on 
'{he  setting  of  a  number  of  control  flags.  Tnese 
flags  are  used  to  test  out  the  relevent  Importance 
of  compacting  different  P  code  Instructions  rather 
than  only  obtaining  the  total  effect.  The  P  com¬ 
piler  produces  code  for  a  stack  machine  where  all 
operations  are  performed  on  the  top  of  the  stack. 
Hence  no  advantage  could  be  taken  of  the  POMP 
instructions  which  operate  between  local  variables 
and  the  top  of  the  stack.  For  example  the 
expressions  A  :«  B  *  C  and  A  :■  A  +  1  produce  the 
following  P  code  and  POMP  code: 

A  :•  B  *  C 

P  code  -  LOAD  B  POMP  code  -  LOAD  B 
LOAD  C  MUL  C 

MUL  STA  A 


The  results  were  then  compared  with  the  DEL 
code  produced  by  a  DEL  compiler  being  Implemented 
at  the  Stanford  Emulation  Laboratory.  Tht  object 
code  size  produced  by  compiling  a  quicksort  program 
on  the  three  different  compilers  were: 


DEL 

POMP  code 

P  code 

Size 

292 

430 

1004 

Factor 

1.0 

1.47 

3.44 

The  reduction  in  P  code  size  due  to  the  POMP 
Instructions,  broken  down  by  compaction  type  were: 


Compaction 
In  bytes 

Number  of 
Instructions 

Compaction  type 

£ 

34  (  6) 

17 

Short  branches 

267  (47) 

89 

Load  and  Store  local 
variables 

129  (22) 

43 

Load  integers  0,  1,  2, 
load  boolean  true  or  false. 
Increment  and  decrement  by  1 

144  (25) 
S74" 

4B 

TS7 

Zero  address  Instructions  - 
operating  on  top  2  elements 
of  the  stack. 

The  total  number  of  Instructions  is  251  for 
both  the  P  code  and  POMP  coda.  In  the  POMP  code 
they  are  broken  down  Into  180  one  byte  Instructions, 
17  two  byte  Instructions  and  54  P  code  Instructions. 
Almost  half  of  the  compaction  Is  achieved  by  com¬ 
pacting  the  load  and  store  locals.  In  contrast  the 
short  branches  had  almost  no  effect  (6X). 


STA  A 
A  :*  A  +  1 

P  code  -  LOAD  A  POMP  code  -  INC  A 
INC  1 
STA  A 

The  four  areas  most  easily  Implemented  and  which 
were  considered  to  result  In  the  greatest  compac¬ 
tion  are: 

1)  Short  branches  -  offset  relative  to  PC 

2)  Loading  and  storing  local  variables 

3)  Loading  small  Intagers  (0,1  and  2),  loading 
boolean  true  or  false,  increment  end 
decrement  top  of  stack  by  1 

4)  Zero  address  operators  l.e.  acting  on  the  top 
two  eltmants  of  the  stack. 

The  P  code  compiler  produces  one  P  code  Instruc¬ 
tion  per  32  bit  computer  word.  No  compaction  of 
the  code  was  considered. 


From  the  preliminary  results  It  looks  Improb¬ 
able  that  the  POMP  code  produced  from  the  P  code 
compiler  can  achieve  the  compactness  of  the  DEL 
code.  Each  POMP  Instruction  would  on  average  only 
occupy  1.16  bytes  as  the  DEL  code  for  this  program 
consists  of  only  68  Instructions  compared  to  251 
for  the  P  compiler:  a  factor  of  3.7.  Even  allowing 
for  the  fact  that  DEL  operators  often  have  Implicit 
loads  and  stores  associated  with  them  there  Is  still 
a  remarkable  difference  In  the  number  of  operations. 
Hence  the  P  code  compiler  has  been  discarded  and 
present  work  Is  using  a  Pascal  compiler  which 
generates  an  abstract  syntax  tree  during  parsing. 
Using  this  coepller  the  full  POMP  code  can  be 
generated  Including  the  Instructions  which  operate 
between  local  variables  and  the  top  of  the  stack. 
Statistics  will  also  be  generated  on  fourteen 
substantial  Pascal  programs  giving  the  frequency 
of  operators  and  memory  references,  end  the 
resulting  DEL  and  POMP  codes  will  be  compared. 

An  advantage  put  forward  for  minimizing  the 
number  of  Instructions  of  the  object  code  Is  that 
It  speeds  up  the  execution.  A  large  mmfeer  of 
Instructions  Increases  the  fetch  and  decoding  time 
but  with  instruction  prefetch  and  with  simple  ROM 
look  up  decoding  It  Is  expected  that  the  difference 
In  execution  speed  due  to  this  effect  will  be  small. 
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AAB 

K  2 
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SAB 

K  J 
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TTA 
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ATB 
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+ 

- 
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SAA 

K 

K  +  K 

TTA 

K 

+ 

(K  +  K)  +  K 

ATB 

7  D 

+ 

- 

Figure  1 


Expression 

Format 

Operands 

Operation 

Stack 

1 )  If  K  -  J  then 

SAB 

K  J  Skip  offset 

<  >  GoTo 

K  :«  K  *■  1 

|  AAB 

K  1 

+ 

- 

2)  K  :=  »roi 

ARRAYA 

U  H  K 

A  :*  B[X] 

- 

3)  H[K]  ,0]  +  N[J] 

ARRAYA 

J  M 

S  :*  A[X] 

M[J] 

ARRAYA 

J  N 

S  :•  A[X] 

N[J],M[J] 

TUT 

+ 

M[J]  +  N[J] 

ARRAYA 

K  H 

A[X]  T 
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Null 

Bool 

ASCII 

X’101 

UUP 

FJP 
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X’ll* 
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FJP 

- 
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FJP 
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Addr 

Bit. Add 

Int 
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Set 
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ADI 

ADR 
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DIF 
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PARTIAL  EVALUATION  OF  A  HIGH-LEVEL  ARCHITECTURE 
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Lari-Erik  Thorelli,  Department  of  Talacomuaicatlon  and 
Computer  Systtni, 

Royal  Institute  of  Technology, 

S-100  44  Stockholm,  Sweden 


The  architecture  of  the  high-level  language 
machine  IAX2,  designed  for  efficiency  in  string 
man i pul al  ion  and  interactive  applications,  is 
evaluated  with  respect  to  program  volume  and 
number  of  interpreted  instruction  bits.  The  eva- 
1  not  ion  takes  the  form  of  a  comparison  with  the 
PDP-11  ,n  chi terture  using  av  test  data  a  set  of 
'■■.'iiipl  in  i',  realistic  programs  from  a  well-known 
sun cce ,  I  he  result  shows  the  superiority  of  the 
hi)>h-U’V"l  architecture. 


It  turns  out  that  LAX2  uses  significantly  fewer 
bits  for  instructions,  both  statically  and 
dynamically.  Thus,  the  present  study  gives  yet 
another  example  of  the  superiority  of  high-level 
architecture,  designed  from  language  and  appli¬ 
cation  considerations,  over  conventional  archi¬ 
tecture.  After  a  short  description  of  the  high- 
level  architecture  the  evaluation  method  and 
results  are  presented.  The  concluding  sections 
compare  the  present  work  with  earlier  evaluation 
studies  and  discuss  the  significance  of  the 
results. 


Introduction 

1  2 

I  .AX  2  '  is  s  high-level  architecture  designed 
lo  lie  efficient  for  string  manipulation  and  inter- 
active  applications.  It  has  type-marked  values, 
dynamic,  storage  allocation,  and  powerful  instruc¬ 
tions  lor  string  manipulation.  The  language  of  the 
machine  is  specified  in  two  levels,  a  source  or 
Lexl  level  TIAX  and  an  executable  level  ELAX. 

‘There  are  no  GOTO’s  in  TLAXj  ail  jumps  are  gene- 
i aled  from  high-level  control  structures  by  the 
simple  T1.AX  -  ELAX  compiler  which  is  a  fixed  part 
•  'I  the  machine.  Memory  is  splitted  into  a  number 
■jl  data  and  program  blocks;  relative  and  indirect 
addressing  is  used  with  out-of-bounds  checking  to 
arh ieve  compact  code  and  high  reliability. 

The  main  design  goals  for  LAX2  are  low  cost  for 
software  production  and  good  memory  and  execution 
time  economy  for  the  intended  class  of  applications. 
IT ir  Jesign  libs'  been  heavily  influenced  by  the  con¬ 
cepts  of  structured  programming.  The  architecture 
lias  been  implemented  as  a  partially  microcoded 
interpreter  on  a  Varian  V73  minicomputer. 

The  present  paper  reports  on  an  evaluation  of 
the  1AX2  architecture.  The  evaluation  is  only  con- 
c 'rued  with  memory  and  execution  time  economy, 
leaving  out  completely  aspects  such  as  ease  of  pro- 
gr. limning  and  debugging,  software  security,  and  ease 
ol  compilation.  Furthermore,  the  number  of  inter¬ 
preted  instruction  bits,  rather  than  physical  exe¬ 
cution  litno,  is  used  as  the  dynamic  measure.  The 
evaluation  consists  of  a  comparison  of  LAX2  with 
TI)P Ml,  using  a  set  of  programs  taken  from  the 
vull-kiujwn  book  Software  Tools  by  Kernighan  mid 
1' longer  . 

s 

The  work  was  done  while  the  author  was 


Short  description  of  the  high-level  architecture 

LAX2  is  a  tagged  architecture*1 .  Its  design  pre¬ 
supposes  a  basic  word  format  of  16  bits.  Currently 
the  machine  recognizes  types  of  values  according 
to  Figure  1. 

Simple  types:  nil,  boolean,  character,  index 
(integer  in  the  range  0-16383) 
Composite  types: 

string  (of  characters) 

node  (heterogeneous  array) 

decimal  (decimally  represented  integer) 

prog  (expectable  procedure) 

coprog  (coroutine  activation) 

channel  (for  input  or  outiut) 

(real,  realarray  planned,  not  vet  implemented) 


Figure  1 ■  1.AX2  date  types 


A  value  of  simple  type  is  represented  by  one 
16  bit  word  with  its  leftmost  bit  cleared.  A  com¬ 
posite  value  is  represented  by  a  16  bit  oru,  the 
head,  whose  leftmost  bit  is  set,  pointing  to  a 
memory  block,  the  body,  containing  a  type-and- 
length  descriptor  and  the  value  proper. 

The  memory  area  of  a  LAX2  process  is  divided 
into  a  stack  in  which  procedure  activation  recordr 
sre  allocated,  and  a  heap,  where  compactif ying 
garbage  collection  is  performed  when  necessary 
(Figure  2) . 
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Figure  2.  Memory  area  of  LAX 2  process 


An  executable  procedure,  i.  e.  a  prog  value,  can 
only  be  created  by  means  of  the  LAX2  instruction 
'compile',  taking  a  string,  the  TLAX  veruion  of  the 
procedure,  as  main  argument.  The  body  ut  a  prog 
value  is  shown  (with  some  simplification)  in 
Figure  3.  Each  rectangle  represents  a  lb  bit  woru. 


administrative 

overhead 


T-| - |~T 


own  variables  ELAX  code 
(not  more  then  31) 


Figure  3.  A  prog  value 


The  ELAX  code  can  only  access  the  own  variables 
and  stack  variables  (locals  and  parameters)  of  the 
current  activation  record.  Figure  A  shows  the  strut.' 
ture  of  an  activation  record  on  the  stack.  The 
Stack  variables  are  also  represented  by  one  word 
each,  and  their  number  may  not  exceed  32.  In  this 
way  addresses  to  commonly  referenced  qualities  are 
kept  very  short.  More  remote  information  is 
reached  through  indirect  addressing.  A  complete 
user  program  consists  of  a  network  of  prog  and 
data  values  linked  by  the  own  variables  of  the 
prog's. 
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(within 
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record 
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stuck  variables 
(one  word  each,  containing 
simple  value  or  head  of 
composite  value  I 


Figure  A.  Ai  l,  ival  ion  record 


Dynamic  type  checking  and  the  static  checking 
performed  by  the  'compile'  instruction  catch  a 
great  number  of  possible  programming  errors. 

Another  feature  promoting  the  efficient  production 
of  reliable  software  is  absence  of  jump  instruc¬ 
tions  in  the  TLAX  representation.  All  jumps  are 
generated  during  compilation  from  high-level  con¬ 
trol  structures.  In  many  other  respects  TLAX  offers 
a  rather  primitive  notation  which,  together  with 
the  high  level  of  ELAX,  makes  the  compilation  pro¬ 
cess  simple. 


ELAX  code  consists  of  a  sequence  of  8  bit  bytgs. 
The  design  is  similar  to  that  of  EM-1  (Tanenbaum3) 
and  is  characterized  by  compactness  and  the  possi¬ 
bility  nf  fast  instruction  decoding.  Figure  5  shows 
mime  simple  stulcments  in  Algol-like  notation  and 
their  ELAX  counterparts. 


Statement  ELAX  code  Mo  of 

bytes 

A:-B+3  push  B,  push  3,  add,  5 

locate  A,  store 

A: *A-H  push  B,  locate  A,  minus  3 

A:»A+1  locate  A,  incr  2 

A: *0  locate  A,  clear  2 


Figure  5.  Simple  ELAX  examples 


LAX 2  has  a  powerful  set  of  string  manipulation 
instructions.  A  small  example  is  given  in  Figure  6. 
The  guiding  principle  hes  been  thet  although  it 
should  be  simple  to  dynamically  create  and  throw 
away  strings,  this  feature  should  not  be  forced 
upon  the  programme'.',  and  thet  lexical  and  other 
kinds  of  string  analysis  could  be  done  with  high 
machine  efficiency.  The  reader  is  referred  to 
(1,2)  for  further  information  on  this  and  other 
aspects  of  the  LAX2  architecture.  Appendix  A 
summarizes  the  ELAX  instruction  list. 


Problem*  Thu  string  S  contains  an  identifier,  an 
operator  symbol  and  an  unsigned  integer,  pos¬ 
sibly  separated  by  blanks.  Assign  the  identi¬ 
fier  (a  string)  to  A,  the  operator  (a  charac¬ 
ter)  to  OP,  and  the  integer  (an  index)  to  B. 

ELAX  solution;  locate  V,  clear, 

push  S,  locate  V,  gotldciU,  locate  A,  store, 

push  R,  locate  V,  getchar,  locate  OP,  store, 

push  S,  locale  V,  get  index,  locate  B,  store. 

I'uial  number  ol  bytes!  17 


Figure  6.  String  analysis  example 
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The  book  Software  Tools  is  highly  'suitable  as  a 
source  oT  benchmark  programs  for  1AX2,  since  the 
programs  are  complete,  have  bean  used  in  practice, 
and  are  typical  of  the  application  area  of  the 
machine.  The  programming  language  used  in  (3)  is 
Retfor,  a  structured  dialect  of  Fortran.  The  fol¬ 
lowing  programs  were  selected  for  use  in  the  in¬ 
vestigation. 

a.  ENTAB  ((3)  pp  37,21,20) . 

Copies  a  text  file,  substituting  each  sequence  of 
spaces  preceding  e  tab  stop  by  a  tab  character. 

Tab  stops  are  located  at  each  8'th  poeition  in  the 
line. 

Program  size:  46  lines  of  source  code  (not  counting 
communt  and  blank  lines). 
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(3)  th«  recursive  sort  program  QUICK. REC  waa  pro¬ 
duced  alio  lor  PDP-11. 


fa.  COMPRESS  ((3)  p  44). 

Produces  a  compressed  version  of  a  file  using  run 
length  compression,  i.  e.,  a  sequence  of  identical 
characters  is  encoded  by  length  and  character 
value . 

Program  size:  36  lines.  The  results 


c.  PUTDKC  ((3)  pp  61,62,  plus  amain  routine). 
Converts  integers  to  ASCII  format  and  places  them 
in  specified  fields. 

Program  size:  38  lines  (excl.  main  routine). 

d .  QUICKSORT  ((.3)  pp  115,110,111,  plus  amain 

rum  i  no) . 

Sorts  a  sequence  of  text  lines  into  lexicographical 
order  by  means  of  the  well-known  "quicksort"  algo¬ 
rithm. 

Program  size:  66  lines  (excl.  mein  routine). 

e.  l'l Ml)  ((3)  pp  136-138). 

Searches  a  file,  outputting  each  line  containing 
a  certain  pattern  given  aa  input.  The  pattern  is 
essential ly  a  regular  expreision. 

Program  size:  279  lines. 

The  set  of  programs  is  rsthsr  small  but  is 
hoped  to  be  representative  of  the  text  processing 

application  area. 

Next,  these  programs  were  translated  for  the 
two  architectures  LAX2  and  PDP-11. 

The  translation  for  LAX2  was  obtained  as  follows. 
The  programs  wore  rewritten  into  the  language  HLAX, 
a  high-level  (above  TLAX)  notation  for  LAX2.  The 
MAX  programs  were  compiled  using  a  cross-compiler 
on  «  DI5C-10  computer.  During  the  rewriting  process 
care  was  taken  to  stay  close  to  the  original  pro¬ 
grams  .  As  u  consequence  the  programs  run  on  LAX2 
have,  except  for  minor  details,  ths  same  data  and 
piogrmu  structures  and  uaa  the  seme  algorithms  as 
the  original  programs.  This  means  that  the  features 
of  IAX2  have  not  been  used  to  full  advantage. 
However,  an  additional  version  (FIND. OPT)  of  FIND, 
optimized  for  IAX2 ,  waa  written.  The  optimization 
relies  mainly  on  the  observation  that  a  majority 
of  search  patterns  consist  of  or  start  by  a  literal 
siring.  Therefore  it  should  pay  to  modify  the  inter¬ 
nal  representation  of  patterns  and  uaa  the  sub¬ 
string  searching  ’part1  instruction  of  LAX2.  Also, 
a  recursive  version  (QUICK. REC)  was  written  in 
addition  to  the  non-recursive  version  from  (3). 

To  translate  the  programs  to  PDP-11  code  the 
language  C  (6)  was  used.  As  bsfore,  ths  rewriting 
was  done  to  faithfully  preserve  ths  given  algo¬ 
rithms  and  structure.  To  obtain  high  quality 
machine  code  all  features  of  the  C  language  pro¬ 
moting  this  goal  were  used,  including  tha  possi¬ 
bility  of  declaring  quantities  to  reeide  in  regis¬ 
ters.  The  programs  wars  compiled  using  ths  opti¬ 
mizing  compiler  available  under  UNIX  .  As  a  result 
u£  these  measures  we  balieva  that  the  machine  code 
is  as  efficient  aa  that  produced  by  a  competent 
assembly  language  programmer,  with  the  possible 
exception  that  the  latter  may  in  soma  cases  feel 
Inclined  to  use  a  less  general  subroutine  calling 
sequence,  to  save  time  for  the  saving  and  restoring 
of  registers.  In  addition  to  the  five  programs  frees 


The  volumes  of  the  programs,  excluding  input/ 
output  routines,  were  measured.  The  volume  is  de¬ 
fined  as  the  size  of  the  executable  form  of  the 
program  including  statically  allocated  data.  The 
sorting  programs  operate  on  data  in  primary  storage; 
this  space  is  not  included,  as  its  size  depends  on 
the  size  of  the  input. 

The  result  is  displayed  in  Table  1. 


Program 

Result 

PDP-11 

Result 

UX2 

UX2/PDP11 
in  t 

ENTAB 

204 

189 

93 

COMPRESS 

1 

PUTDEC 

1 

QUICKSORT 

1 

QUICK. REC 

251 

210 

84 

97 

60 

62 

233 

132 

57 

175 

77 

44 

FIND 

1282 

776 

61 

FIND. OPT 

(1282) 

841 

66 

Total 

2242 

2 

1444 

64 

Notes :  I :  exoluding  main  program 

2:  excluding  FIND. OPT 


Program  volumes 


1  A  Wit- 


The  high  percentage  figures  for  the  first  two 
programs  are  explained  by  the  fact  that  they  use 
data  structures  whose  sizes  dominate  over  the  sizes 
of  the  programs  proper. 

In  addition  to  these  static  results,  dynamic 
measurements  were  derived.  The  programs  were  exe¬ 
cuted  on  the  two  machines  and  the  number  of  inter¬ 
preted  instruction  bits  was  recorded.  These  counts 
exclude  all  input/output  handling. 

The  following  text  filet  were  used  for  input 
during  the  dynamic  measurements  (Table  2)  . 
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File 

Content 

No  Of  ASCII 
symbols 

No  of  lines 

TEXTO 

extract  from 
report 

S80 

13 

TEXT1 

extrsot  from 
report 

4752 

98 

TEXT2 

extrsot  from 
report 

3806 

99 

TBXT3 

source  oode, 

C  language 

1038 

51 

TEXT4 

mail  address 

list 

5697 

100 

TEXTS 

mall  address 
list 

1139 

20 

Table  2. 

Input  data 

The  KNTAB  and  COMPRESS  program  were  run  uaing 
files  TEXT1  -  TEXT4  aa  input.  PUTDEC  uses  no  input 
fila;  instead,  the  suin  routine  aakes  36  calls  on 
the  converaion  procedure. 

The  sorting  progrems  were  used  to  sort  the  lines 
of  TEXT1 ,  TEXT3 ,  and  TEXT4 ,  and  also  an  already 
sorted  version  TEXT4S  of  TEXT4 ,  resulting  in  worst- 
case  performence. 

Finally,  the  FIND  prograas  were  run  using  a  col¬ 
lection  of  19  search  patterns  and  the  input  files 
TEXTO,  TEXT3,  and  TEXT5.  Three  groups  of  measure- 
ments  were  performed.  Group  1  uses  siaple  search 
patterns  consisting  of  single  literal  strings. 

Croup  3  uses  complicated  search  patterns,  and  group 
2  falls  in  between  groups  1  and  3. 

As  in  the  selection  of  the  test  prograas  then- 
selves,  the  aim  in  the  selection  of  test  data  was 
to  achieve  realistic  and  typical  conditions  with  a 
reasonable  amount  of  effort. 


Program 

Data 

Result 
PDP-1 1 

Result 

LAX2 

LAX2/PDP11 
In  % 

ENTAB 

TEXT  1-4 

7105 

4357 

61 

COMPRESS 

TEXT1-4 

9599 

7308 

76 

PUTDEC 

- 

130 

46.3 

36 

QUICKSORT 

TEXT1.3.4 

2942 

421 

14 

QUICKSORT 

TEXT4S 

6000 

558 

9 

QUICK. REC 

TEXT  1,3,4 

2925 

354 

12 

QUICK.  REC 

TBXT4S 

5925 

516 

9 

FIND: 

PATTERN 

all 

patterns 

667 

187 

28 

PIND: 

group  1 

8323 

2768 

33 

MATCH 

group  2 

16586 

4778 

29 

group  3 

23266 

6166 

26 

group  1-3 

48197 

13712 

28 

FIND. OPT: 
PATTERN 

til 

patterns 

(667) 

121 

18 

FIND. OPT: 

group  1 

(8323) 

68.8 

1 

HATCH 

group  2 

(16588) 

1102 

7 

group  3 

(23286) 

6630 

28 

group 1-3  (48197) 

7801 

16 

Table  3.  No  of  interpreted  instruction  bits 
(unit:  1000  bits) 


prograas  also  perform  better  than  average  on  LAX2 , 
mainly  due  to  the  use  of  string  co*q>arison  instruc¬ 
tions  built  into  LAX2. 
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Table  3  sumnarizes  the  result  of  the  dynamic 
measurements.  For  FIND  measurements  were  taken  sep¬ 
arately  on  the  pattern  building  part  (PATTERN)  and 
the  pattern  matching  part  (MATCH). 

The  superiority  of  the  high-level  architecture 
is  evident.  Sunning  all  measurements  (omitting  the 
non-recursivs  QUICKSORTS),  we  get  the  overall 
figure  28X  for  the  ratio  of  LAX2  to  PDP-11.  However, 
the  variation  across  the  programs  is  high,  and  the 
result  depends  on  the  teat  data  used. 

The  case  of  the  optimized  MATCH  shows  highly 
favourably  values  for  LAX2,  especially  for  group  1. 
The  main  explanation  is  that  the  search  patterns  of 
group  1  consist  of  single  literal  strings,  allowing 
the  search  to  be  performed  by  the  substring  search¬ 
ing  'part*  instruction.  Likewise,  the  patterns  of 
group  2  consist  of  literal  strings  appended  by 
other  constructs,  so  part  of  the  search  can  be 
speeded  up  as  in  the  case  of  group  1.  The  sorting 


The  least  favourable  case  for  LAX2  is  Che 
COMPRESS  program.  A  closer  look  shows  that  this  is 
the  program  with  the  lowest  frequency  of  procedure 
calls.  Procedure  calling  is  s»re  efficient  in  the 
high-level  machine  than  in  PDP-11,  and  the  genera¬ 
lity  of  the  call-return  sequence  produced  by  the  C 
compiler  emphasizes  the  difference.  Code  optimiza¬ 
tion  across  procedure  boundaries  esn  be  expected  to 
improve  the  PDP-11  results  in  some  cases.  Such 
optimization  is  however  a  complex  task. 


Discussion 

An  objection  to  the  results  presented  is  that 
the  influence  of  data  storaga  and  accessing  has 
been  neglected.  Additional  questions  may  be  raised 
concerning  the  relevance  of  the  number  of 
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interpreted  instruction  bits  as  an  architectural 
measure.  These  issues  will  now  be  discussed.  In 
addition,  the.  present  work  will  be  related  to  sim¬ 
ilar  published  investigations. 

The  volumes  displayed  in  Table  1  do  not  include 
storage  allocated  dynamically  during  execution.  How 
would  this  dynamic  storage  requirement  affect  the 
comparison i  A  look  at  the  test  programs  shows  that 
the  effect  is  small.  Only  the  sort  programs  can 
allocate  more  than  in  the  order  of  10  words.  The 
sort  programs  use  one  more  word  per  line  of  input 
in  1.AX2  than  in  PDP-11,  due  to  the  use  of  type-and- 
length  descriptors.  With  the  test  data  used  this 
amounts  to  a  6Z  increase  in  data  storage.  The  stack 
frames  in  l.AX 2  are  smaller  than  those  used  by  the 
I'Ul’-l  1  code.  The  influence  of  this  difference  is 
small,  however.  The  recursive  sort  programs  grow  a 
stack  whose  depth  is  only  log2(n)  frames,  where 
n  is  the  number  of  input  lines. 

the  number  of  interpreted  instruction  bits  has 
been  shown  lo  be.  small  for  LAX2  (Table  3).  it  might, 
however,  be  suspected  that  the  number  of  memory 
reference.!,  during  access  to  data  is  higher  for  LAX 2 
than  for  the.  conventional  machine,  since  each  com¬ 
posite  value  is  equipped  with  a  one-word  descriptor. 
llnf.irtmuiteLy  no  mechanism  was  available  for  moni¬ 
toring  this  effect.  Inspection  shows  that  in  the 
case  of  Lhu  test  programs  the  descriptor  references 
would  add  well  below  51  to  the  execution  time.  The 
actual  figure  is  of  course  quite  implementation 
dependent. ,  lot  instance,  a  cache  memory  would  pro¬ 
bably  almost  eliminate  the  overhead. 

Hie  number  of  interpreted  instruction  bits  (NIB) 
la  an  architectural  measure  clearly  related  to  exe¬ 
cution  speed.  Small  NIB  values  means  that  little 
time  is  spent  in  fetching  instructions,  however, 
the  complexity  of  the  decoding  process  must  also  be 
considered.  T.n  the  caae  of  LAX2  va  PDP-11  the  latter 
1  actor  naems  to  be  of  small  importance. 

(liven  a  physical  implamantation  of  an  architec¬ 
ture,  one  would  expect  the  execution  times  of  pro¬ 
grams  to  be  proportional  to  their  NIB  values. 

However,  the  accuracy  of  thia  correspondence  depends 
on  the  homogeneity  of  the  instruction  set,  i  e  the 
degree  to  which  the  instructions  all  "do  the  same 
ranount  of  work".  In  particular,  the  effect  of 
vector  instructions  has  to  be  taken  into  account. 
Like  many  other  high-level  architecture*  LAX2  has 
instructions  operating  on  variable  length  data,  in 
particular  etrings.  If  such  iterative  or  vector 
instructions  are  used  frequently  and  on  large  data 
items,  then  clearly  the  number  of  interpreted  in¬ 
struction  bits  will  give  a  too  optimistic  view  of 
physical  execution  time. 

To  estimate  this  effect  the  uee  of  vector  in¬ 
structions  in  the  test  programs  waa  invaatigatad. 

Thu  programs  KNTAB,  COMPRESS,  and  PUTDEC  make  neg¬ 
ligible  use  of  vector  instructions.  The  sorting 
programs  compare  text  linea  by  maana  of  the  vector 
instruction  'string  compere'.  With  the  teat  data 
used  the  average  number  of  iteration*  (character 
comparison  step*)  performed  par  auch  inatruction  la 
ho  low  ns  3,  Tha  relative  frequency  of  th*  instruc¬ 
tion  in  51.  Let  us  essuma,  rather  arbitrarily, 
that  each  iteration  counts  ae  three  normal,  i  e 


non-vector,  instructions.  Assume  further  that  all 
instructions  are  of  the  same  length  in  bits  -  which 
is  close  to  being  true.  Then  we  arrive  at  a  prolong¬ 
ation  factor  of  1.4  due  to  the  use  of  vector  in¬ 
structions.  That  is,  to  get  a  more  realistic  measure 
of  expected  execution  time,  add  40Z  to  the  results 
in  Table  3  in  the  case  of  the  sort  programs. 

The  non  optimized  FIND  program  makes  less  fre¬ 
quent  use  of  vector  instructions.  The  optimized 
FIND .OPT s MATCH ,  however,  uses  the  substring 
searching  instruction  'part'.  For  literal  string 
patterns  (group  1)  we  find  the  relative  frequency 
of  'part'  to  be  2Z  and  the  average  number  of  ite¬ 
rations  to  be  close  to  50.  This  gives  a  prolonga¬ 
tion  factor  of  close  to  4. 

These  findings  correlate  well  with  the  results 
of  Table  3  but  do  not  fully  account  for  the  high 
superiority  of  LAX2  in  the  cases  discussed.  The 
remaining  cause  seems  to  be  that  the  PDP-11  versions 
are  more  heavily  burdened  with  subroutine  linkage 
than  the  LAX2  versions,  where  certain  subroutines 
have  been  replaced  by  vector  instructions. 

As  mentioned  in  the  Introduction  the  1AX2  machine 
has  been  implemented  us  a  partially  microcoded 
interpreter  on  che  Varian  V73  minicomputer.  The 
volume  of  i he  microcode  is  180  64-bit  words,  and 
the  remainder  of  the  interpreter  consists  of 
approximately  7K  16-blt  words  of  V73  machine  code. 
Thus  the  microcoded  part  is  small.  F.xecution  times 
on  the  two  machines  were  measured  for  the  test  pro¬ 
grams.  The  execution  time  ratio  of  I.AX2-V73  to 
VDP-11/45  varieB  from  14  to  0.3  with  6-8  as  typical 
values.  These  figures  are  quite  satisfactory,  con¬ 
sidering  the  usual  slowdown  due  to  software  inter¬ 
pretation.  The  hardware  characteristics  of  the  two 
minicomputers  are  roughly  equal. 

Finally  a  comparison  of  the  present  work  with 
similar  published  investigations. 

a 

Milner  has  evaluated  voulmes  of  Fortran  and 
Cobol  programs  on  Burroughs  B1700  using  language- 
oriented  instruction  sets,  in  comparison  to  IBM 
System  S/360  (and  Burroughs  B3500) .  The  results 
show  improvements  by  a  factor  2  to  3,  larger  than 
the  factor  of  about  1.5  for  LAX2  compared  with 
PDP-11.  This  is  however  not  surprising;  S/360  code 
is  less  compact  than  PDP-11  code  • 

Wortman^  compared  the  Student  PL  Machine  of 
his  own  design  with  the  S/360.  A  large  number  of 
small  student  programs  were  used  as  teat  cases. 
Several  dynamic  and  static  measure*  vers  evaluated. 
The  result*  show  a  twentyfold  superiority  for  his 
machine  in  number  of  instruction  bits,  both  in  the 
static  and  dynamic  sense .  However,  it  should  be 
noted,  first,  that  his  S/360  progrsais  vara  produced 
by  the  standard  PL/I(F)  compiler,  and  aacondly, 
that  all  runtime  checks  built  into  his  high  level 
architecture  are  eleo  included  in  th*  S/360  ver¬ 
sions,  Thee*  checks  include  the  PL/I(F)  condition* 
'subscript  range' , 'overflow' ,  end  'stringrang*' . 
This  ia  in  contrast  with  our  investigation,  where 
such  checks  are  indeed  performed  by  tha  LAX2 
machine  but  not  by  th#  PDP-11  program  versions. 

Nielsen11  compared  a  proposed  high-level 
languaga  architecture  for  tha  SPL  language,  a 
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high-level  language  with  special  provisions  lor 
expressing  vector  and  matrix  computations,  with  the 
Honeywell  HDC-7Q1P  aerospace  computer.  The  high- 
level  architecture  versions  of  a  set  of  benchmark 
routines  were  found  to  require  19%  fewer  program 
hits  than  carefully  coded  assembly  language  ver¬ 
sions.  A  timing  analysis  showed  that  the  high-level 
architecture  programs  could  be  expected  to  require 
14%  less  execution  time. 

Tafvelin  and  Wikstrbm1'^  compared  a  proposed 
high-level  language  architecture  for  the  machine 
oriented  high-level  language  Mary  with  IBM  S/360. 

A  set  of  seven  programs  was  used,  with  a  total 
S0360  volume  of  42000  bits.  The  main  result  is  that 
program  size  is  reduced  almost  by  a  factor  of  3. 

This  is  partially  attributable  to  a  sof isticated 
adressing  scheme  called  "refined  display"  used  in 
their  architecture.  No  dynamic  results  are  given. 

5 

The  work  by  Taneubaura  has  already  been  men¬ 
tioned.  His  EM-1  architecture  shares  several  proper¬ 
ties  with  LAX2  but  does  not  have  the  application 
orientation  of  the  latter.  The  performance  evalu¬ 
ation  he  reports  is  based  on  a  small  amount  of  data. 
All  performance  figures  concern  static  code  size. 
Apart  from  isolated  statements  and  programming 
constructs  he  treats  only  four  small  programs. 

Their  total  size  on  the  PDP-11  is  3776  bits  and  on 
EM-1  47%  of  this  figure. 


Conclusion 

The  reported  work  has  given  yet  another  example 
of  the  superiority  of  high-level  architecture, 
designed  from  language  and  application  comiidcrn- 
lions,  over  conventional  architecture.  The  evalu¬ 
ation  was  partial  -  the  only  examined  properties 
were  program  volume  und  number  of  interpreted  in¬ 
struction  bits.  These  quantities  were  evaluated 
using  a  set  of  complete,  realistic  programs  from  n 
well-known  source  . 

The  following  features  contribute  significantly 
to  the  shown  superiority  of  the  high-level  archi¬ 
tecture  : 

-  efficient  subroutine  support 

-  structured  memory,  short  addresses 

-  application  oriented  data  types  and  operations. 

As  stated  in  the  Introduction  the  goals  for  the 
1-AX2  design  include  low  cost  for  software  produc¬ 
tion.  The  high-level  architecture  supports  this 
goal  by: 

-  eliminating  concepts  from  low-level  programming 
such  as  registers,  primitive  addressing, 
pointer  arithmetic,  and  goto  statements 

-  easing  the  compilation  process  (the  basic  com¬ 
piler  is  available  as  a  machine  instruction) 

-  providing  extensive  run-time  protection. 

We  ere  convinced  that  these  properties  signifi¬ 
cantly  promote  programmer  productivity  as  well  as 
the  reliability  of  the  software  produced.  The  con¬ 
tinuing  riee  of  the  ratio  of  software  coat  to  hard¬ 
ware  coat  emphasizes  the  importance  of  such  "soft" 
advantages  of  high-level  architecture.  Unfortuna¬ 
tely  they  are  herd  to  quantify.  To  do  so  for  LAX 2 
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would  require,  in  the  first  place,  more  practical 
experience  with  the  machine  than  is  available  today. 
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Appendix  A 

ElAX  Instruction  Susmaary 

Of  1  he  256  available  byte  value*,  the  onee  in  the 
upper  half  are  reserved  for  producer a  end  locators 
(byte  values  in  hexadecimal) : 

80-9F:  producers,  st/ack  variables 
AU-BK:  producers,  own  variables 
CO-DF:  locators,  stack  variables 
K 0 — 1*' E i  locators,  own  variables. 

A  producer  pushes  the  value  of  a  variable  on  the 
stack.  In  the  case  of  a  composite  value,  only  its 
head  is  pushed. 

A  locator  locates  the  place  of  a  variable  and  initi¬ 
ates  a  locator-sequence.  The  latter  is  composed  as 
described  by  the  regular  expression 
locator  pursuer*  (catcher  t  effector) 

The  opcodes  used  for  pursuers,  catchers,  and  effec¬ 
tors  are  in  the  interval  00-2F,  and  the  same  op¬ 
codes  are  also  used  for  other  instructions.  This 
is  possible  since  the  some  instruction  cannot 
occur  both  within  and  outside  a  locator-sequence. 

Pursuers  enable  remote  accessing.  The  three  main 
pursuers  are: 

'Pcomp':  The  located  value  must  be  a  string  (,real- 
nrrny)  or  node  v.  An  operand  l  of  type  index  is 
required  (on  the  stack).  The  i'th  component  of 
v  becomes  located. 

' Pfirst ' :  The  located  value  must  be  a  atving  (.real- 
array)  or  node  v.  The  first  component  of  v  be¬ 
comes  located. 

'I'own':  The  located  value  must  be  a  prog  p.  An  ope¬ 
rand  i  of  type  index  is  required.  The  i'th  own 
variable  of  p  becomes  located. 

Catchers  push  a  value  on  the  stack.  The  three  main 
catchers  are  'Ccomp',  'Cfitst',  and  'Cown',  cf  the 
producers  above.  The  value  produced  is  that  of  a 
component  or  an  own  variable,  respectively. 

Effectors  are  categorized  as  basic  effectors, 
string  effectors,  and  special  effectors.  The  basic 
effectors  are: 

'clear1:  writes  the  index  value  0. 

'scratch':  writes  the  value  nil . 

'store':  writes  a  value  popped  from  the  stack. 

'plus ', 'minus'  (only  for  index  values) :  adds,  reap. 

nubtracts,  a  value  popped  from  the  stack. 

'  inc.r ' ,  'deer  '  (located  value  must  be  index):  incre¬ 
ments  by  1,  resp.  decrements  by  1. 


The  constant  nil  and  the  boolean  constants  true 
and  false  have  ono-byte  representations. 

String  constants:  The  empty  string  has  s  one-byte 
representation.  Other  strings  have  s  (n+2)-byte 
representation,  where  the  first  byte  is  an  opcode, 
the  second  contains  n,  and  the  remaining  bytes 
the  character  codes  of  the  string  (l»nt255). 
Decimal  constants:  See  ref.  (2). 

(Real  constants:  Planned,  see  ref.  (1).) 

The  remaining  data  types  (see  Fig.  1)  have  no  con¬ 
stants  . 

The  instruction  class  computers  contains  instruc" 
ctlona  taking  a  number  of  values  from  the  stark 
and  producing  a  value  un  the  Btack.  These  Instruc¬ 
tions,  like  the  constants,  are  side-effect-free. 
Subclasses  of  computers  include  binary  operators, 
unary  operators,  binary  predicates,  unary  predi¬ 
cates,  converters,  and  creators.  All  computors  have 
a  one-byte  representation. 

The  binary  operators  are  '+',  '/',  and 

'modulo'.  They  are  defined  for  boolean,  index, 
decimal  (and  real)  operands. 

The  unary  operators  are  'negate',  defined  for  boo¬ 
lean,  decimal  (and  real)  operands,  and  'abs',  de¬ 
fined  for  decimal  (and  real)  operands,  ('truncate' 
and  'round'  are  planned  for  reals.) 

The  biliary  predicates  are  'same',  Miff'  for  roni- 
p.irism:  ol  heads  of  composite  values,  and  six  rela¬ 
tional  predicates,  del ined  for  operands  of  types 
boolean,  index,  decimal  (,  real),  and  character  and 
string.  -  Here,  as  with  most  other  instructions, 
a  character  is  regarded  us  a  string  of  length  one, 

The  unary  predicates  are  'letter'  and  'digit'  fur 
character  operands,  and  'bad',  yielding  true  if  and 
only  if  its  operand  in  ni 1 ,  and  'good'  -  the  nega¬ 
tion  of  'bad '  . 

Converters  convert  from  one  data  type  to  another. 

In  essence,  direct  conversion  is  possible  between 
chaiacter  and  index,  between  index  and  decimal 
(,  between  decimal  and  real,  and  between  index  and 
real).  The  converter  'length'  produces  the  length 
of  a  string,  node,  decimal,  prog  (or  realarray). 

The  creators  create  a  new  composite  value  (head  on 
stack,  body  on  heap).  They  are  'create' ,  to  create 
a  string  or  node  (or  realarray)  of  specified  length, 
'copy',  to  produce  a  copy  of  a  composite  value, 
'substring',  to  produce  a  substring  from  specified 
positions  in  a  string,  and  'cat',  to  produce  the 
concatenation  of  two  strings. 


ilef. ore  summarizing  the  string  effectors  some  other 
classes  ol  instructions  will  be  treated. 

Coils L_ai: l.s  are  instructions  pushing  a  value  described 
by  the  instruc: ion  itself  on  the  stack- 
index  constants:  Values  0-10  are  represented  by  the 
byte  values  00-0A.  Values  11-255  art  represented 
by  two-byte  instructions.  Values  256-16383  are 
represented  by  three-byte  instructions. 

Character  constants:  Rapresented  by  two-byte  in¬ 
structions,  where  the  second  byte  contains  the 
character  code. 


The  string  effectors  have  the  following  in  common: 
-  The  located  value  must  be  an  index  v. 


-  At  least  one  operand,  a  string  is 

required. 

-  v  must  be  less  than  n. 

The  effector  treats  the  string  segment  s^j-.-s^  and 
will  normally  Increase  the  valus  of  v  as  a  side 
effect.  The  aim  has  been  to  enable  convenient 
and  efficient  sequentiol  processing  of  strings. 

The  string  effectors  ire  categorized  as  predicate 
effectors,  pass  effectors,  locate  effectors, 
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get  effectors,  anil  put  effectors.  Descriptions  of 
tlie  individual  instructions  can  be  found  in  ref. 

(1).  Here  we  can  only  offer  an  enumeration  of  them; 
hopefully  their  names  give  some  hints  of  their 
meanings . 

Predicate  effectors: 

'prefix',  'part',  'subequ' 

Pass  effectors: 

'pass',  'paslet',  'pasdig',  'paslotdig' 

Locate  effectors: 

'locate',  'loclet',  'locdig',  'locletdig' 

Get  effectors  (get  value  from  string): 

'getindex',  'getchar',  'getident',  'get  dec', 
('getreal',)  'getstring' 

Put  effectors  (put  value  into  string): 

'putnext',  'putpart' 

The  next  instruction  class  of  interest  is  the  jumps . 
All  jumps  are  generated  from  high-level  control 
structures  during  the  TLAX-ELAX  compilation.  These 
include,  in  short: 

if  -  then  -  else  :  generates  forward  jumps 
case  :  generates  jump  table,  an  indexed  jump, 
and  forward  jumps 

do  -  od^  :  generates  a  backward  jump 
exits  from  do-od :  generate  forward  jumps. 

In  addition,  the  constrol  structure  suggested  by 
Zahn  (C  T  Zahn,  A  control  statement  for  natural 
top-down  structured  programming,  Programming  Syrnp. 
Proc.  1974  (Ed:  B  Robinet),  Springer,  170-180) 
is  implemented  in  EAX2. 

All  jumps  are  within  progs  and  relative;  distances 
are  coded  in  one  or  two  bytes.  In  total  22  opcodes 
are  allocated  to  jumps. 

Additional  instructions  controlling  the  flow  of 
computation  are: 

'exec',  'return':  for  ordinary  procedure  (prog) 
activation, 

'  in i L ' ,  'attach',  'detach1,  'resume',  'call':  used 
in  connection  with  coroutines  (coprogs). 

'exit':  for  abandoning  the  current  computation 
and  reinitialization  of  the  LAX2  process. 

I.AX2  supports  frequency  measurements  during  exe¬ 
cution.  So-called  counters  can  be  placed  at  arbit¬ 
rary  points  in  programs;  they  are  (if  enabled) 
automatically  incremented  each  time  they  are  passed 
during  execution.  Instructions  exist  for  operating 
the  counters. 

Pixprograms  arc  protected  programs  created  at  tin- 
initialization  of  a  EAX2  process.  Some  of  them  are 
automatically  activated  by  different  runtime  error 
events.  There  are  also  instructions  for  activating 
1 ixprograms  from  other  programs. 

The  'compile'  instruction,  invoking  the  Tl.AX-KEAX 
compiler  ,  is  implemented  partially  as  hidden 
LAX2  programs.  There  exist  special  ELAX  instruc¬ 
tions  only  available  to  these  programs,  cf  ref. 

(2). 

A  set  of  input/output  instruction.--  is  uesrribfd 
in  ref.  (2). 
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Abstract 

Il>n  complexity,  in  apace  and  in  time,  of 
directly  interpreting  aerial,  block  structured, 
hlp.b  level  language*  is  examined.  On  the  basis 
u  lliln  iitudy,  it  is  apparent  why  it  is  undesir¬ 
able  to  directly  interpret  high  level  languages. 
A  nyotomatlc  procedure  is  developed  for  the 
design  of  well-matched  intermediate  languages 
'oi  supporting  high  level  languages, 


With  the  steadily  increasing  emphasis  upon 
tlu!  construction  of  structured,  relleble  and 
iiiuinfniiiilble  software,  the  trend  la  toward  the 
use  of  suitable  high  level  languages  (HLLs )  in 
pm  Heretic#  to  machine  or  assembly  level  Ian* 
gmigeo .  The  computer  architect  thus,  is  laced 
with  the  tusk  of  designing  a  conducive  environ¬ 
ment  for  the  execution  of  HLL  programs.  'Ibis 
I n  it  shift,  in  perspective  at  least,  away 
ITom  the  traditional  role  of  the  computer 
architect;  no  longer  is  it  appropriate  to 
approach  the  design  task  St  the  machine  language 
level . 

One  viewpoint  advocates  the  direct  interpre¬ 
ts  I:  ion  of  the  HLL  program,  by  a  interpreter 
Implemented  in  either  hardware,  software  or 
1 1  tinware  ,  c.g.,  [1,2,3].  The  problems  associated 
with  such  direct  interpretation  have  bean 
nkisLched  in  previous  work  [4,5]  and  will  be 
elaborated  upon  in  this  paper  to  demonstrate  the 
general  undesirability  of  this  approach.  Thus, 
it  will  be  shown  that  moat  HLla  are  not  directly 
interpretable  by  the  space-time  criteria  that 
lire  developed  subsequently. 

The  alternative  is  to  translate  the  HU, 
program  Into  an  intermediate  representation  that 
is  directly  interpretable ,  Such  an  intermediate 
language  is  termed  a  directly  interpretable 
language  (DILI  F5l •  Currently,  the  DIL  raoat 
frequently  used  is  the  machine  language  of  an 
available  computet.  Unfortunately,  all  too 

^ Till  s  work  was  supported  by  the  Joint  Servlcea 
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often,  the  machine  language  has  not  been  designed 
with  the  given  ill.L  in  mind  leading  to  significant 
inaff iciencies  in  time  end  spsce.  It  is  of 
interest,  therefore,  to  understand  and  formalize 
the  design  of  a  DIL  that  is  well  matched  to  a 
given  HLL  and  the  relationship  between  the  two. 
Such  a  DIL  could  than  either  conatltuta  tha 
instruction  sat  architecture  of  a  machine 
dedicated  to  that  HLL,  or  could  ba  interpreted 
by  a  universal  host  machine  (UUM) .  l.a.,  a 
machine  which  can  interpret  any  DIL  with  equal 
end  relatively  llttla  difficulty.  Thle  paper 
preaants  soma  preliminary  raaulta  relating  to 
the  properties  of  HLLs  that  disqualify  them 
from  being  DlLs,  the  relationship  between  well- 
matched  HLLs  and  DlLs  and  the  process  of 
designing  a  Dll,  for  a  given  HLL.  Identifying 
the  essential  characteristics  of  the  universe 
of  DlLs  cluarly  is  valuable  in  determining  the 
architecture  of  universal  host  machines. 

The  primary  motivation  behind  the  search 
for  an  ideal  DIL  is  the  dosire  to  optimise  the 
space-time  requirements  of  the  interpretation 
process.  A  secondary  goal  is  to  facilitate 
the  compilation  process.  Soma  interesting 
space-time  measures  and  analyse*  of  "ideal" 
Intermediate  languages  have  been  developed  by 
Hoevel  and  Flynn  [6] .  In  this  paper  an  attempt 
is  made  to  approach  the  design  of  DIL*  in  a 
systematic,  top-down  fashion  with  no  assumptions 
as  to  what  the  end-product  should  look  like. 
Instead,  it  is  dictated  by  a  systematic  method¬ 
ology  that  accepts  ss  input  a  description  of 
the  HLL  and  is  guided  by  current  technological 
limitations. 

The  DIL  design  will  be  effected  in  this 
paper  by  considering  the  issues  and  problems 
Involved  in  directly  interpreting  a  HLL.  By 
reo»vlng  these  problems  vis  s  systemstlc  trans¬ 
formation  process,  the  target  DLL  will  be 
derived.  Although  no  specific  host  hardware 
descriptions  are  considered  during  the  design, 
such  *  DIL  should  (by  th*  definition  of  a  DIL 
[5])  be  one  for  which  it  Is  technologically 
feasible  to  build  a  hardwired  interpreter.  In 
other  words,  it  should  ba  possible  to  view  the 
target  DIL  **  s  machine  language  for  e  hypo¬ 
thetical  computer  with  certain  basic,  practically 
fcailble  data  and  control  structures.  Such 
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specific  implementation  considerations  will  bo 
dlHcussed  In  on#  of  the  later  sections. 

h  !i  Makl  of  Interpretation 

In  thla  aaction,  we  ahall  present  a  concep¬ 
tual  model  of  the  process  of  (direct)  interpre¬ 
tation  of  a  serial  HLL.  Soma  of  the  main  features 
of  the  Interpretive  process  will  then  be  Illus¬ 
trated  In  terms  of  this  model  and  a  specific 
exampla  high  level  language.  Figure  l  presents 
the  syntax  and  aesmntlcs  for  some  of  the  produc¬ 
tions  of  our  example  HLL.  The  syntax  is  specified 
in  a  context  free  BNP  metanotation ;  the  semantics 
corresponding  to  each  production,  are  specified 
in  a  semi-formal  manner.  If  not  originally  so, 
the  source  context  free  grammar  (CFG)  specifi¬ 
cation  is  assumed  to  have  been  converted  to  an 
equivalent  6- free  form.  The  algorithmic  methods 
of  achieving  such  a  conversion  are  well  known  17] 
end  era  not  dlscutead  here .  the  !1LL  program  of 
Figure  2  will  be  uaad  as  a  working  example. 

Our  conceptual  model  of  lnterpretetion 
drewe  heevily  upon  the  concept*  in  Johnston's 
Contour  Modsl  [8]  end  Knuth's  spprosch  to 
specifying  the  aesumtlcs  of  programing  languages 
[9].  It  consiats  of  four  concurrent,  interacting 
proceaaea i 

1.  Lexical  Analysers  Thla  proceaa  la  a  string 
to  string  transducer  which  converts  the  input 
alphanusmrlc  string  into  a  output  string  of 
tokens  corresponding  to  lexemes,  the  function, 
operation  and  complexity  of  this  proceaa  ere 
relatively  well  underitood  and  will  not  be 
considered  further  in  this  paper. 

2.  Syntactic  Analyser:  This  phase  of  inter¬ 
pretation  (alao  known  ae  parsing  or  recognition) 
is  in  sssencs  s  string  to  tree  trsnsduction 
process,  where  the  string  of  token*  emitted  by 
the  lexical  analyzer  ie  converted  into  s  (parse) 
tree  using  soon  convenient  parsing  strategy. 

3.  Static  Semantic  Analyzer:  This  process  is 
the  one  which  operates  on  the  tree  being  built 
by  the  syntax  analyzer  by  associating  with  each 
node  the  relevant  semantic  Information  needed 
to  be  able  to  perform  the  actions  called  for  by 
the  program  semantics .  Any  propagation  of 
attributes  (up  end  down  the  tree)  required  to  bo 
performed  in  order  to  fully  specify  the  attri¬ 
bute!  (end  hence,  the  semantic  actions)  of  each 
nodo,  has  to  be  carried  out  by  this  enelyzor  ( 8 1 . 
Node*  or  subtree*  deemed  useless  (l.e.  after  atl 
relevant  attribute*  have  baan  made  use  of  or 
tranesilttad  to  tha  root  of  tha  subtree)  ere 
discarded  a*  the  analysis  proceeds.  This  process 
does  not,  itself,  perform  the  actions  indicated 

by  the  program.  It  merely  gathers  the  information 
needed  end  sett  up  the  next  process.  All  date- 
independent  actions  that  can  be  performed  by 
analyzing  the  source  program  alone,  arc  in  the 
realm  of  the  static  semantic  analyzer. 


4 ,  Dynamic  Semantic  Analyzer:  This  process 
actually  perform*  the  aomantics  of  the  program, 
by  executing  the  semantic  actions  aasoclatad 
with  each  node  of  the  tree.  Subtrees  are  die- 
carded  aa  soon  ae  the  reluvant  semantic  actions 
have  been  executed  and  the  attributes  ere  no 
longer  needed  by  the  static  eenmntlc  analyzer. 

I t  is  Important  to  note  that  the  four  pro¬ 
cesses  listed  above  run  in  a  mutually  interlocked 
manner  such  that  each  process  get*  ahead  of  the 
next  one  in  sequence  only  to  the  extent  necessary 
for  the  latter  to  operate.  The  controlling  pro¬ 
cess  is  the  dynamic  semantic  analyser  whose 
actions  ere  specified  by  the  statements  following 
the  label  "Dynamic  actions"  in  the  definition  of 
the  semantics  in  Figure  1.  In  performing  its 
function,  it  must  make  use  of  certain  attributes, 
termed  S -derived,  which  are  evaluated  by  the 
static  semantic  analyzer.  S-d*rlv*d  attribute* 
ere  defined  to  be  those  attributes  which  can  be 
derived  by  an  analysis  of  the  program  text 
(l.e.,  input  data  independent).  The  derivation 
of  these  attribute*  is  specified  in  Figure  1  in 
an  assertive  rather  than  an  Imperative  manner, 
l.e.,  their  relationship  to  other  attribute*  is 
specified  Instead  of  a  aeries  of  statements  tha 
execution  of  which  would  assign  to  them  their 
correct  value,  the  manner  in  which  they  are 
derived  is  deliberately  left  unspecified.  It  is 
implicitly  understood  that  the  dynamic  aemantlc 
analyzer  forces  tha  static  semantic  analyzer  to 
procoed  Just  far  enough  that  tha  needed  S -derived 
attributes  have  been  evaluated.  The  syntax 
analyzer  hae  a  pointer,  SYN,  into  the  string  of 
lexemes  emitted  by  the  lexical  analyzer,  that 
point*  ono  lexeme  beyond  the  (minimum)  esuunt  of 
tha  string  that  the  syntax  analyzer  must  have 
consumed  so  aa  to  aat  up  enough  of  tha  syntax 
tree  for  tha  static  aemantlc  analyzer  to  perform 
its  function.  Tha  syntax  tree  la  necessary 
sinca  the  S-derlved  attributes  ere  necessarily 
defined  in  the  context  of  this  tree.  Generally, 
thu  lexical  analyzer'*  pointer,  USX,  into  the 
alphanumeric  string  will  correspond  exactly  to 
SYN .  Assume  the  dynamic  semantic  analyser  if 
executing  the  semantics  of  the  nod*  labelled 
'Block);  in  Figure  2.  This  requires  knowledge 
of  the  number  of  declarations  in  tho  outermost 
block.  To  determine  this,  tha  static  semantic 
analyzer  requires  that  all  the  declarations  in 
the  outermost  block  be  parsed.  Consequently,  UJX 
will  be  st  tha  "x"  losnadlately  following 
"integer  xi". 

Tlie  manipulation  of  SYN  end  LEX  is,  by  end 
largo,  implicit.  In  the  case  of  loops,  condi¬ 
tional*,  procedure  calls  and  returns,  the 
dynamic  action*  explicitly  alter  LEX  (and 
consequently  SYN)  by  a  statement  of  the  fora 
"Perse  (u,v)"  or  "Pare*  end  Process  (u,v)"  where 
u  identifies  a  character  in  tha  program  text  by 
its  memory  address  end  v  ie  e  non-terainel  which 
serves  ea  the  goal  for  the  parser.  In  the  case 
of  procedure  calls,  the  current  value  of  LEX  ie 
saved  explicitly. 
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In  Figure  1,  attributes  labelled  D-d*rlv«d 
arc  evaluated  by  tha  dynamic  semantic  analyzer. 

An  S-dorlved  attribute  la  termed  COPIED  11  it  La 
merely  tho  copy  of  an  attribute  elsewhere.  An 
attribute  la  INHERENT  If  lta  valua  la  an  inherant 
property  of  that  noda ■  In  addition,  tha  type 
of  the  attribute  (INTEGER,  REAL,  POINTER,  ate.) 
are  specified,  Figure  1  clearly  demonstrate* 
the  complexity  of  tha  procedure  call  and  return 
(see  production*  10  and  24).  Note  alao  that 
production  21  require*  that  the  text  to  b*  skipped 
be  parsed,  oven  though  it  will  not  ba  executed, 
Just  to  determine  where  tha  (Stmt)  or  (Simpatmt) 
ends . 


3  ■  _Sp*&r  and  Tli  Requirement* 
for  Interpretation 

The  model  of  interpretation  developed  in 
tha  previous  section  nay  ba  uiad  to  obtain  a 
qualitative  understanding  of  tha  tlaae  and  space 
involved  In  the  direct  interpretation  of  HLLs. 
Although,  in  practice,  the  tree  representation 
would  probably  be  discarded  In  favor  of  a  more 
compact  representation  such  as  a  stack,  the  epace 
occupied  by  the  trao  is  related  by  a  factor  of 
proportionality  and,  so,  Is  a  good  indicator  of 
t.liu  actual  space  requirement* .  Tha  advantage 
of  flic  tree  representation  lias  in  lta  conceptual 
simplicity  which  la  uncluttered  by  extraneous 
Implementation  issue*. 

Tho  upsco  requirements  are  five-fold: 

(1)  tha  u  puce  occupied  by  the  program  being 
intarpretad;  (2)  that  occupied  by  tha  interpreter; 

(3)  that  required  to  hold  the  portion  of  tho  syn¬ 
tax  true  that  la  currently  in  axlatence;  (4)  the 
spsco  needed  to  etore  the  attribute*  associated 
with  the  tree  node*;  (5)  the  space  occupied  by 
tho  paras  atack  which  crnitalna  terminals  and  non- 
tormina  la  that  have  bean  scanned  by  tha  syntax 
analyzer  but  ar*  yat  to  b*  reduced.  (This  la 
nnudod  when  a  bottom-up  parsing  scheme  i*  used.) 

Tins  total  computation  time  for  tha  Interpreter  Is 
t'ho  sum  of  tha  computation  times  for  tha  individ¬ 
ual  processes. 

An  obvious  way  of  reducing  tha  alza  of  tha 
program  being  Interpreted  It  to  replace  the 
alphanumeric  string  representation  of  lexemes  by 
more  efficiently  encoded  bit-etring*  during  a 
pro- processing  step.  As  a  result,  the  lexical 
analysis  process  would  be  eliminated  from  the 
Interpreter  thereby  reducing  the  interpretation 
time.  On  tho  other  hand,  no  longer  would  one  be 
interpreting  the  original  HLL  directly;  Instead, 
a  closoly  related  language  would  be  tho  object 
of  Interpretation.  In  thle  manner,  by  identi¬ 
fying  the  problems  associated  with  the  direct 
Interpretation  of  the  original  HLI.  and  by  modi¬ 
fying  the  HLI.  only  to  tha  axtent  abaolutaly 
nocaasury  to  remove  theaa  problem*,  one  obtain*  a 
language  that  la  ae  cloealy  related  to  the  orig¬ 
inal  as  possible  while  poeseeslng  the  property 
ot  boing  directly  interpretsbl# .  Pragmatically, 

«  language  will  b*  considered  to  ba  directly 
Interpret  able  If,  In  tha  context  of  currant 
technology  and  cost-function# ,  it  Is  feasible  and 
desirable  to  directly  Interpret  tho  language  in 
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comparison  to  alternative  strategics,  thus,  the 
demarcation  between  languages  which  ara  and  are 
not  directly  Intirpretabl*  Is  vague  et  beat  and 
may  b*  expected  to  change  with  time . 


The  space  occupied  by  the  Interpreter  la 
related  to  its  complexity.  The  dynasiic  semantic 
analyser  la  central  to  the  Interpreter  and  can, 
at  bast,  ba  made  more  efficient  but  cannot  be 
eliminated.  At  shall  be  shown  subsequently,  the 
static  semantic  analyzer  and  the  syntactic 
analyzer  can  be  eliminated  by  suitably  modifying 
the  language. 


Tha  (pace  requirement!  for  the  eyntax  tree 
are  bast  minimized  by  reducing  the  amount  of  the 
tree  that  la  in  existence  at  any  one  time.  This 
corresponds  to  thoss  nodes  that  have  not  yet 
bear,  processed  and  discarded  by  the  dynamic 
semantic  analyzer.  Whereas  the  objective  must 
be  to  prevent  tha  syntax  analyzer  from  getting 
far  ahead  of  the  dynamic  semantic  analyzer 
(to  minimize  the  size  of  the  tree  present),  there 
are  factors  that  will  prevent  the  realization  of 
thla  goal;  there  are  occasion*  when  the  dynamic 
semantic  analyzer,  to  perform  its  function, 
require*  information  (attributes)  that  the  static 
semantic  analyzer  can  provide  only  by  looking 
ahead  In  the  tree,  which  in  turn  requires  that 
the  syntax  analyzer  have  proceeded  far  enough 
ahead.  The  language  must  be  altered  to  remove 
such  situations,  These  modifications,  by 
roduclng  the  size  of  the  tree,  also  reduce  the 
total  number  of  attributes  that  must  be  stored 
and,  conaaquantly ,  the  amount  of  space  needed  for 
this  purpose. 

The  fifth  Bpace  requirement  depends  upon  the 
parsing  strategy  that  Is  selected  (or  Imposed 
by  tha  graonar  specification) .  The  two  broad 
class**  of  parsing  techniques  are  the  top-down 
and  the  bottom-up  method*.  Moat  parsing  strat¬ 
egies  can  ba  viewed  a*  either  one  or  the  other 
or  a  hybrid.  With  the  top-down  technique,  the 
production  to  be  used  1*  known  when  the  eyntax 
analyzer's  pointer  into  the  string  corresponds  to 
the  left  most  terminal  of  that  production  (with 
an  optional  look  ahead  of  k).  The  input  tokens, 
therefore,  may  be  consumed  and  acted  upon  as 
they  are  encountered  since  their  syntactic 
significance  Is  defined  idien  they  are  first 
encountered.  In  contrast,  bottom-up  techniques 
know  which  reduction  Is  to  be  applied  only  when 
the  syntax  analyzer's  pointer  is  at  the  token 
which  corresponds  to  the  right  most  terminal  ol 
the  corresponding  production  (once  again,  with 
an  optional  look  ahead  of  k) .  In  general,  there 
will  exist  s  number  of  terminals  (and  non¬ 
terminals)  whose  syntactic  significance  he*  not 
yet  bean  established  (since  the  corresponding 
right  handles  have  not  yet  been  encountered),  but 
which  have  been  already  acannad  by  tha  syntax 
analyzer.  Space  is  neadtd  to  store  these  items, 
generally  In  the  form  of  a  stack.  From  this 
point  of  view,  a  grsnzoar  suited  to  top-down 
parsing  Is  indicated. 


With  respect  to  interpretation  time,  there 
is  little  that  can  be  dona  to  minimize  the  time 
required  by  the  dynamic  semantic  analyzer  beyond 


eliminating  Inefficiencies  elnce  the  algorithm 
embedded  in  the  program  mutt  bo  executed,  the 
amount  of  computation  performed  by  the  atatlc 
semantic  analyzer  la  reduced  If  the  type  of 
attribute  propagation  can  be  matched  to  the  para* 
lng  ttratagy.  Inherited  (ayntheaized)  attributes 
can  be  handled  eaally  with  a  top-down  (bottom-up) 
atrategy.  However,  elnce  both  typea  of  attri¬ 
bute*  are  generally  involved,  the  best  approach 
la  to  explicitly  provide  certain  crucial  attri¬ 
bute*  in  the  string,  thereby  implying  a  further 
modification  to  the  language. 

Before  discussing  ways  of  reducing  the  time 
expended  in  syntax  analysis,  it  la  instructive 
to  catalog  th*  various  reasons  for  the  existence 
of  syntax  with  a  view  to  totally  eliminating  the 
syntax  analyzer  if  possible. 

1.  Reliability,  Ths  major  function  of  syntax 
at  this  point  is  to  reatrlct  the  user  to  a 
set  of  strings  that  are  meaningful  to  the 
language  processor. 

2.  Readability. 

3.  to  remove  static  semantic  ambiguities.  The 
procedure  for  deriving  attribute*  is  defined 
in  eh*  context  of  the  syntax  tree  which  mutt , 
therefore,  be  derived. 

4.  To  remove  dynamic  semantic  ambiguities. 

Often  the  dynamic  semantics  of  certain  con¬ 
structs  era  defined  by  the  syntax  tree, 
e.g.,  precedence  relationships  binding 
operands  to  operators . 

3.  To  panslt  an  efficient  parting  strategy. 

In  the  case  of  a  HLL,  all  of  theae  points 
era  important  and  the  syntax  cannot  ba  ignored; 
nor  can  tha  ayntax  analysis  be  ellsiinated.  If 
tha  amphaaia  is  placed  on  the  laat  iaaue,  that 
of  an  efficient  parting  strategy  to  reduce  tha 
interpretation  tics,  than  It  may  be  neceeaery, 
aa  we  ehall  tea,  to  sacrifice  some  readability. 

We  ehall  do  ao  to  obtain  a  "high-lsh  level"  DIL. 

On  tha  other  hand ,  if  we  are  lnterastad  in 
a  related  "low  level"  DIL,  i.e.,  one  which  is 
compiled  Into  and  than  Interpreted  but  never 
directly  programed  in,  then  only  laauea  3 
through  3  are  relevant.  Readability  la  clearly 
unimportant  and  reliability  la  guaranteed  since 
the  compiler  will  not  pees  any  Illegal  programs. 

If  we  further  perturb  the  language  so  that  tha 
semantics  are  defined  independently  of  the 
syntax,  than  ayntax  analysis  is  rendered  useless 
and  may  be  diecarded  altogether,  the  Interpreter 
may  now  recognize  a  degenerate  grammar  (one  with 
very  few  productions)  which  essentially  permit* 
any  string  of  terminals.  the  syntax  analysis  for 
such  a  gramer  consists  merely  of  checking  for 
Illegal  tersdnala. 

Both  th*  hlgh-iah  level  DIL  and  the  low 
level  DIL  are  closely  related  to  the  original  HLL 
by  virtue  of  tha  systematic  transformations 
that  are  listed  in  th*  next  section,  the  former 
DIL  may  be  viewed  a*  a  substitute  for  th*  HLL 
if  *  directly  interpretable  HLL  Is  deemed 
essential,  the  latter  DIL  la  best  vlavad  aa  a 
well  matched  intermediate  language  for  th*  HLL 
It  1*  claar  that  a  number  of  DIL*  nay  be  defined 
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that  are  intermediate  between  theae  two  DlLs. 


4.  A  Design  Methodology  for  Directly 
Interpretable  Languages 


In  th*  context  of  the  previous  discussion, 
the  following  sequence  of  modifications  (on  the 
high  laval  language)  may  ba  used  Lo  arrive  at  a 
directly  interpreted*  language:- 
(*)  Distinct  syntactic  tokens  or  left  handles 
(represented  by  underscored  Integer*  in  this 
paper,  e.g.  i,3)  are  inserted  to  all  production 
right-hand  (Ida*  (Flgura  3).  Ihla  makes  ths 
gramer  LL(1),  thus  simplifying  th*  top-down 
syntax  analysis  phase. 


In  practice,  all  productions  would  not  have 
distinct  left  handles;  only  the  productions 
corresponding  to  th*  same  non-terminal  need  have 
distinct  left  handles.  This  would  drastically 
reduce  the  number  of  syntactic  token  needed  to 
six.  However,  in  th*  interest*  of  clarity,  we 
shall  retain  this  redundancy.  No  changes  to  the 
semantics  are  called  tor  aa  a  result  of  thla  step. 

(b)  Each  production  right  hand  aids  1*  use- 
ordered.  in  accordance  with  th*  sequence  of 
semantic  a peel fleet lone  attached  to  that  produc¬ 
tion,  i.e.,  th*  terminals  and  non-terminal*  are 
placed  In  the  same  order  In  which  they  are  uaed. 
Figure  4  show*  the  productions  affected  by  this 
step. 

(c)  Semantic  tokens  (Integers  with  overscorea: 

”,  A  or  v)  are  Introduced  at  selected  points  In 
the  productions  to  Indicate  the  need  for  aemantlc 
actions.  Of  these,  the  first  type  of  tokens 
(e,g,  53)  calls  for  semantic  actlon(e)  which  can 
be  performed  without  reference  to  a  propagated 
attribute.  Such  tokens  can  thus  ba  scanned  and. 

i remediate ly  acted  upon.  Th*  saeond  type  (a.g.  3)) 
references  an  attribute  that  la  propagated  from 
a  nods  which  la  to  the  right  in  tha  tree  (right - 
to-laft  attribute  propagation),  while  the  third 


type  (e.g.  6)  use  an  attribute  obtained  from  the 
left  ( lef t-to-right  attribute  propagation). 

Figure  3  Illustrate*  tha  effect  of  applying  this 
step  to  the  eelacted  production*. 

(d)  The  second  and  third  typas  of  aemantlc 
tokens  (marked  A  and  v)  are  replaced,  In  each 
case,  by  a  token  of  tha  first  kind  (aiarkad  ~) 
followed  by  an  explicit  attribute,  (a.g.  (numb)), 
thereby  eliadnatlng  th*  need  to  propagate  attri¬ 
butes  at  interpretation  time.  In  th*  last  two 
step*  a  number  of  redundant  semantic  tokens  hav* 
been  defined  to  enhance  clarity.  In  practice, 
this  reuundancy  would  be  eliminated. 

(e)  All  th*  original  terminal  symbol*  (a.g. 
begin,  end  etc.)  ere  deleted  from  ths  language 
and  ths  gramaer.  These  symbols,  It  may  ba  noted, 
are  totally  redundant  at  this  point,  both  syntac¬ 
tically  and  semantically. 


The  final  form  of  the  OIL  graaamr  et  th*  end 
of  steps  (a)  through  (e)  la  shown  In  Flgura  6. 


It  la  to  be  noted,  In  summary,  that  our 
newly  derived  language  (DIL)  ha*  tha  following 
desirable  properties : 

1)  Top  down  LL(1)  parsing  (with  no  back 
track)  la  possible.  Thus  ayntax  analysis 


in  simple. 

2)  close  tracking  between  the  three  inter¬ 
pretation  subproceetee  Is  possible, 
resulting  in  minimum  tree  storage  re¬ 
quirements  and  overall  speedup  in  the 
semantic  analysis  phase. 

1)  Due  to  the  closely  matched  HLL  and  Dll 
grammars,  a  simple  syntax-directed  trans¬ 
lation  scheme  (SDTS)  [10]  may  be  adopted 
for  the  translation  phase. 

It  I s  to  be  noted,  that  minimizing  the  space 
requirement  for  holding  the  DIL  program,  has  not 
really  been  considered  in  listing  the  modifica¬ 
tion  steps.  However,  one  might  guess  that  the 
price  paid  (in  terms  of  increased  program  size) 
for  nchluv'ng  the  advantages  listed  above  is 
Hecuptub  I e  . 

The  language  that  we  have  just  doiived  muy 
lie  used  as  a  high  level  language  In  which  pro¬ 
gramming  may  be  performed  If  the  lexemes  are 
represented  a lphanumerica lly  and  the  tokens  are 
represented  by  keywords,  This  will  require  the 
i otntroduction  of  the  lexical  analyzer.  Hie 
most  unacceptable  feature  of  thia  language  lies 
In  having  to  explicitly  specify  the  number  of 
lexemes  that  have  to  be  branched  over.  The  uao 
of  labels,  while  making  the  language  marginal  lv 
acceptable,  would  require  the  equivalent  ol  a 
one-and-n-half  pass  assembly  phase.  Hie  lan¬ 
guage.  would  no  longer  be  directly  interpretuble . 

If  wn  desire  a  language  that  is  to  be  used 
merely  to  be  compiled  into  and  then  directly 
interpreted,  wu  can  continue  the  transformation 
process  further.  Since  the  need  for  attribute 
propagation  by  the  static  semantic  analyzer  is 
no  longer  present,  syntax  analysis  at  this 
point  Is  needed  only  for  checking  the  syntactic 
correctness  of  the  program.  If  the  DIL  Is  not 
to  bn  used  for  direct  pcogramnlng,  syntactic 
checking  is  unnecessary,  since  any  errors  would 
have  been  detected  during  the  translation  phase. 
Adopting  this  point  of  view,  we  may  proceed 
to  delete  all  tokens  which  are  purely  syntactic 
(i.e.,  tokens  that  are  only  underscored)  from 
cl i u  DIL.  grammar  of  Figure  6.  The  result,  now 
truly  resembles  an  "asaembty”  language,  In  that 
the  program  consists  of  a  sequence  of  semantic 
tokens,  or  "op  codes".  Figure  8  shows  the  pro¬ 
gram  with  numerical  tokens  replaced  by  »i  i  j  >1 1  u  - 
hetic  mnemonics .  The  simplest  grmunar  that 
will  accept  programs  in  this  "assembly"  language 
is  the  trivial  grammar  shown  In  Figure  7,  since, 
tho  absence  of  syntax  checking  Implies  that  any 
sequence  of  semantic  tokens  is  acceptable  to 
the  interpreter,  even  if  semantically  meaning - 
less.  If  the  interpreter  is  based  on  this 
graituiar,  the  syntax  analysis  process  becomes 
degenerate.  Hie  granmar  of  Figure  6  (after 
deleting  purely  syntactic  tokens)  Is  needed, 
nevnrthe  losa ,  to  permit  the  translation  of  tho 
HU,  program  into  the  "assembly"  language  in  n 
syntax-directed  way. 

In  actual  practice,  some  minimum  amount  m 
syntax  chocking  may  be  desirable  oven  nt  the 
"assembly"  language  level,  In  which  case,  the 
grammar  speei  I  leiition  would  he  inlcrmedluu 


between  the  two  "extremes"  oi  Figure  6  (full 
syntax  checking  capability)  and  Figure  7  (no 
syntax  checking) . 

5.  Technological  Constraints 
and  Implicatlona 

Various  assumptions  regarding  the  available 
hardware  and  software  technology  have  been 
implicit  up  to  this  point.  These  assumptions 
will  now  be  discussed.  Firstly,  it  is  assumed 
that  the  best  technique  for  the  construction  of  u 
parse  tree  is  through  the  use  of  a  pushdown 
automata.  (Compiler  theory  offers  no  better 
alternative).  Ilenc.c,  syntax  analysis  will 
necessarily  lie  time-consuming  unless  the  y.nimmai 
Is  1.1.  (1). 

If  is  assumed  that  the  large  scale  use  ol 
associative  memory  will  not  be  cost-effective  hi 
acceptable .  Hence,  Information  must  be  repre¬ 
sented  by  data  structures  that  support  aearchln>  . 
For  instance,  the  association  of  an  Identifier 
reference  to  the  cortesponding  declaration 
(to  obtain  attributes)  would  clearly  bu  facili¬ 
tated  by  the  use  of  associative  memory.  In  the 
absence  of  associative  manor/,  this  information 
must  be  maintained  In  data  structures  (hash 
(allies,  linear  lists,  etc.)  which  simplify  tin 
search.  Furthermore,  since  such  searches  arc, 
at  best,  relatively  slow,  it  is  preferable  to 
provide  explicit  attributes  in  the  program  which 
convert  the  associative  search  to  a  well-defined 
look-up  procedure.  In  the  previous  example,  Lhe 
identifier  reference  should  be  replaced  bv  two 
attributes  consisting  oi  the  specification 
(relative  to  the  current  contour)  of  the  contour 
containing  the  variable  and  Lhe  ordinal  number  ni 
t lie  identifier  declaration  amongst  the  set  ol 
declaration  attributes  attached  to  the  corre¬ 
sponding  bloc,.)  node  (i.e.,  an  address  couple). 

Also,  It  is  not  evident  how  a  tree  structure 
muy  ho  implemented  in  hurdwaru  whereas  stacks  are 
reudlly  implements!) le  either  In  hardware  or  in 
software.  Thus ,  whereevor  possible,  tree  struc¬ 
tures  must  be  replaced  by  stacks.  The  sub-tree 
corresponding  to  -.exp  can  be  supported  by  un 
evaluation  stack.  If  this  Is  done,  the  semantics 
associated  with  curtain  productions  in  the  gram¬ 
mar  must  be  a’Lered  and  be  expressed  In  terms  ol 
stack  opuruLloiiN.  If  the  block  retention  nil  os. 
of  the  lungtiage  permit  (as  Is  the  case  In  our 
example  language),  the  contour  nodes  muy  he 
maintained  on  »  contour  Btuck  urn)  the  associated 
declaration  attributes  may  lie  allocated  space  un 
;l"  allocution  stack.  As  in  the  burroughs'  hn'din 
| 11|,  the  throe  stackB  may  be  combined  (with  a 
slight  nltcudent  Increase  in  complexity). 

(i .  Discussion 

The  undesirability,  in  space  and  time,  ol 
directly  interpreting  most  DLLs  stems  from  the 
need  ro  do  syntax  and  static  semantic  analyses, 
.ariuus  iaelors  contribute  to  this  need  and  it 
bus  been  shown  how  they  can  be  eliminated  to 
vie  Id  a  diri.tlv  i  nlci  pretab  le  language.  flu-  oil 
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that  Is  obtained  is  not  unique;  two  Dlls,  a  low- 
level  one  and  another  higher  level  one,  -e 
Oo,l\‘  In  this  paper  by  a  systematic  -r«..,*forraa- 
tlon  process.  Other  trade-offs,  not  discussed 
in  this  paper,  exist  between  the  size  of  the  Dll. 
program,  the  size  of  the  syntax  tree  and  the 
Interpreter* r  .  time,  thus,  a  space  of  Oils  exist 
for  each  Hi.*,,  and  the  one  selected  tr.-.ist  be 
specified  by  further  con?-  Ints  and  cri'erle. 
Also,  precise  measure*  ice  and  tirr  w.ed 

to  be  developed  to  place  ,■  qua  liter  l>.t  . 
sloiis  on  a  quantitative  footing. 

Most  compilers  have  a  code-optimization 
phase  which  performs  two  functions;  oienina- 
Independent  optimization  and  machine-dependent 
optimization.  TV.  fo.-mer  consists  of  p-  ,ram 
trana  format  Iona  which  involve  knovleuge  of  the 
DIL  being  complied  Into.  Such  optimization  la 
generally  self  •£*»?»•  tlug  In  .  HLL  Interpreter 
einca  the  coat  of  reported  optimization  out 
weighs  .he  beneflti  accrued.  When  deelgning  a 
DIL  for  e  HLL,  the  pretence  of  the  optimization 
phase  in  the  compl1 „r  ahould  not  b»  ignored 
since  it  can  alter  the  structure  of  the  syntax 
tree  Into  a  directed  acyclic  graph  (e.g.,  a 
common  sub-expression1 s  trae  may  be  a  sub-tree 
for  a  number  of  nodes).  The  stack,  by  Itself, 
may  not  be  an  adequate  vehicle  for  implementing 
such  networks.  Machine-dependent  optimization  is 
present  primarily  to  bridge  the  mismatch  la tween 
the  semantics  of  the  HLL  end  the  machine  language . 
However,  if  the  "machine"  language  designed  to 
match  the  ‘.ILL.  this  form  of  opt* raize tion  may 
prove  unnecessary. 

The  Important  issue  of  encoding  strategics 
for  DIL  programs  has  not  been  touched  upon  in 
this  papur  and,  so,  program  statistic!)  for  the 
[ILL  have  not  '->rioed  an  input  to  the  DIL  design 
proceee .  The  encoding  technique  used  can  assume 
/artous  levels  of  complexity.  To  begin  with, 
the  introduction  of  redundant  syntactic  and 
semantic  tokens  should  be  avoided.  Assuming  that 
The  Interpreter  will  run  on  a  muchinu  that  pro¬ 
vides  for  accessing  arbitrary  length  bit-strings 
^essential  for  a  UHM) ,  tile  tormina  Is  oi  the  Dll, 
should  be  assigned  codes  that  contain  just  enough 
bits  to  differentiate  between  the  terminals  that 
could  have  appeared  at  that  point  In  this 
respect,  the  grammar  of  Figure  6  Is  preferable 
to  that  in  Figure  7  since  it  reduces  the  inherent 
ambiguity  at  each  step.  On  the  other  hand, 
syntactic  tokens  are  now  needed  and  may  cause  s 
nut  increase  in  program  size.  UnaPy,  a 
f requorey-based  "needing  scheme  mav  he  employed, 
defined  either  on  the  linear  string  or  on  the 
parse  tree  [12].  the  latter  scueme  will  probably 
do  better,  but  makes  syn^-'x  analys's  a  ncccssitv- 
yct  another  space-time  t  ide-off. 

The  low-level  DIL  that  was  obtained  Is  not 
radical  is  nature  and,  In  fact,  looks  i|uLtc  simi¬ 
lar  to  a  number  of  stack  architectures.  However, 
the  relationship  between  features  of  the  1)1  L  and 
the  HLL  la  now  clearer.  Also,  issues  such  as 
the  Instruction  formats  to  he  used,  whit  i  gen¬ 
erally  assume  a  central  position  In  instruction 
vet  design,  Call  out  in  a  natural  "sinner  us  a 


result  of  encoding  derisions  and  conform  to, 
rather  than  constrain,  the  other  syntactic  and 
semantic  requirements  of  tha  DIL. 

In  conclusion,  we  do  not  advocate  the  direct 
Interpretation  of  sophisticated  high  level  lan¬ 
guages  since  there  ire  far  too  many  costly  compu¬ 
tations  Involved  that  are  beat  factored  out  and 
performed  just  once  during  a  compilation  phase. 
Instead,  e  well-matched  directly  Interpretable 
language  should  ba  designed  along  the  Unea 
suggested  in  this  paper.  Thereby,  space-time 
saving*  will  be  achieved  and  the  compilation 
procras  will  ba  facilitated. 
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BMIR  [2] 

PROC  [49] 

BKIH  [ij  IRT 

PUSHI+  [l]  ASSIGN  |2,o|  PUSHI+  |o]  ASSIQR  |2,l)  PUSHY  A1+  iv>,i} 
PUSH!*  [o]  TOT  BRPC  [29]  PU8HVAL+  {2,0}  PUSHY  AL->  (l  ,o)  TRB  BBPC 
[16] 

PUSHVAL+  {1,1}  PUSHY AL+  {2,0}  ADD  ASSIQR  { 1  , 1  }  PUSKVAL+  |2,0l 
PUSHI+  [l]  ADD  A33IQH  {2,0}  BRBU  [23] 

BH7U  [4] 

RRD  BRETURR 
IRT 

PU3HI*  [lO]  ASSIOR  jo,  1  1  CAI.L2  |0,0}  PUSHVAL*  {0,l}  PASSVAL  PU8HADDR 
I 0,1}  PASSADDR 
SRD  HALT 


figure  8.  "Assembly"  language  program.  Rumbers  in  "f  ]"  represent 
literal  values)  those  in  "{  }"  represent  address  couplee < 
The  lexical  level  of  the  outermost  block  (main  program)  is 
0,  that  of  the  procedure  la  1  and  the  inner  bloak  la  at 
lexioal  level  2. 


Poutnoteei-  (/>  F'J’  6  ») 

14  The  address  oouple  has  the  format  {lexical  level,  ordinal  nuaber  of  variable  in 
the  declaration  liet}.  For  both  the  nuabring  etarts  eith  0. 

TYal-or-loc  le  an  explicitly  propagated  attribute  ehioh  can  aeaiae  one  of  teo 
values,  specifying,  respectively,  ehether  the  value  or  the  addrees  of  the 
identifier  ia  required.  Since  this  attribute  can  assume  only  teo  values,  it  is 
better  taken  oars  of  by  assuming  teo  different  semantic  tokens  (op  codes)  share 
necessary!  e.g  49V AD  and  49LOO»  (ride  Figure  7). 
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TWENTY  YEARS  OF  BURROUGHS  HIGH-LEVEL  LANGUAGE  MACHiNES 

E.  Dean  Earnest 

Burroughs  Corporation 
Mission  Viejo,  California 


Abstract 

A  discussion  is  presented  of  several 
computer  systems  developments  over  the  past  20 
years  at  Burroughs  Corporation.  Some  of  the 
system  design  philosophy  and  concepts  employed 
by  the  system  designers  are  included  to  pro¬ 
vide  an  understanding  of  the  motivation  of 
certain  design  decisions. 


The  basic  sot  oi  machine  design  and  use 
concepts  were  first  publicly  discussed  by  Bob 
Barton  in  1961.  The  first  commercial  delivery 
of  a  machine  whose  design  was  based  on  this 
approach  (the  Burroughs  B5G00)  was  made  in  the 
early  1960s.  The  concepts  embodied  in  that 
system  have  been  expanded  over  the  past  20  years 
through  insights  made  possible  by  our  accumu¬ 
lated  experience  in  high-level  language  proces¬ 
sing  environments. 

A  brief  discussion  is  presented  of  some  of 
the  concepts  and  design  principles  which  have 
guided  Burroughs'  computer  systems  design,  A 
review  of  some  representative  developments  from 
selected  systems  design  projects  is  included 
with  some  of  the  design  and  use  ideas  which  wore 
incorporated. 

General  Concepts  and  Ideas 

Burroughs'  computer  systems  architecture  for 
the  past  20  years  is  a  consequence  of  the  artic¬ 
ulation  of  and  adherence  to  a  relatively  small 
set  of  closely  related  design  concepts  and  Ideas 
Following  are  representative  of  these  tenets: 
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Introduction 


H|gh_-_Leye  1  Languages 


A  discussion  of  Burrroughs  Corporation's  20 
years  experience  with  high-level  language 
machines  should  be  considered  In  the  context  of 
some  of  the  concepts  and  philosophies  which 
served  to  guide  the  system  designers. 

A  central  theme  which  has  guided  the  devel¬ 
opment  of  computer  systems  for  over  20  years 
at  Burroughs  can  be  characterized  as  follows: 

The  role  of  computer  systems  is  to 
facilitate  communication  between 
people  through  the  amplification  of 
human  capabilities.  Anything  which 
creates  a  distraction  from  the 
achievement  of  this  role  should  be 
regarded  as  being  wrong. 

The  use  of  higher-level  languages  throughout 
Burroughs  computer  systems  is  consistent  with 
that  theme.  The  development  and  evolution  of 
efficient  machine  architectures  to  support 
those  abstract  notations  significantly  facil¬ 
itates  communication. 
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One  of  the  more  important  concepts  Introduced 
with  the  Burroughs  B5000  was  a  dedication  to  the 
use  of  higher-level  programming  notation  to  the 
practical  exclusion  of  machine  or  assembly  lan¬ 
guages.  It  was  proposed  and  demonstrated  that 
a  computer  system  could  be  designed  and  imple¬ 
mented  which  would  provide  a  sympathetic  and 
efficient  host  to  an  exclusively  higher-level 
language  processing  environment. 

At  the  time  of  introduction  of  the  B5000, 
higher- level  languages  were  considered  to  be  of 
limited  practical  value  in  the  real  world  of 
information  processing.  Their  use  consumed 
vast  amounts  of  resources  (particularly  time) 
for  the  compilation  process. 

The  resource  consumption  for  the  compilation 
process  was  considered  so  severe  that  users 
frequently  abandoned  the  high-level  represen¬ 
tation  of  a  program  after  the  initial  design 
and  an  error-free  compilation.  They  frequently 
completed  the  testing  and  patching  process  In  a 
more  primitive  representation.  They  thereby 


. 
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avoided  solving  the  basic  problem  of  not  having 
an  efficient  language  processing  system.  As  a 
result  of  this  multiple  representation,  the 
operational  program  did  not  resemble  the  initial 
high-level  description. 

In  addition  to  the  problems  with  compilation 
performance,  the  object  programs  executed  sig¬ 
nificantly  slower  than  the  proportedly  equiv¬ 
alent  programs  written  in  lower-level  notations. 

On  contemporary  machines,  both  performance  obser¬ 
vations  were  valid.  The  problems  confronting 
compiler  writers  were  signif1cant--conventiona! 
machines  were  not  designed  to  facilitate  the 
mapping  of  an  abstract  notation  to  the  set  of 
primitive  functions  supported  by  those  machines. 

In  spite  of  these  drawbacks,  higher-level 
languages  achieved  some  acceptance  because  of 
the  now-recognized  advantages  of  their  use  for 
program  design,  implementation,  and  enhance¬ 
ment. 

Since  the  B5000  was  designed  to  efficiently 
handle  programs  written  in  ALGOL  60,  it  was 
natural  to  Implement  all  programs,  including 
systems  software,  in  that  language. 18  The  use 
of  higher-level  languages  for  all  progranminq 
was  critical  to  the  success  of  the  entire  pro¬ 
ject.  The  approach  permitted  a  continued 
interaction  and  feedback  among  the  hardware 
and  software  designers,  the  system  Implementors, 
and  the  system  users.  During  the  course  of  the 
B5000  project  and  subsequent  developments,  the 
roles  of  most  of  the  participants  in  the  de¬ 
sign  changed.  Systems  designers  subsequently 
became  software  designers.  These,  in  turn, 
became  software  implementors  who  are  Included 
in  the  population  of  systems  users.  The 
continued,  exclusive  use  of  higher-level  lan¬ 
guages  contributes  to  a  fluency  in  those 
languages.  It  also  provides  strong  motivation 
for  the  development  of  an  efficient  system.  At 
Burroughs,  the  system  users  are  system  de¬ 
signers  and  are  expected  to  contribute  to  the 
hardware  and  software  architectures,  implemen¬ 
tations,  and  enhancements. 

The  viability  of  using  higher-level  lan¬ 
guages,  which  was  demonstrated  on  the  B5000, 
reinforced  Burroughs'  commitment  to  the  ap¬ 
proach  on  subsequent  systems  designs  and 
program  product  developments. 

It  should  be  noted  that  while  high-level 
languages  have  achieved  a  certain  acceptance 
today,  it  Is  largely  due  to  advances  in 
compiler  technology.  Some  modern  compilers  do 
achieve  an  acceptable  performance  level.  Else¬ 
where  in  the  industry,  machines  are  not  being 
designed  to  facilitate  high-level  languages. 

The  Design  Team 

A  blending  of  technologies  and  experience  is 
required  for  the  design  of  a  commerc  lolly  via¬ 
ble  computer  system.  At  Burroughs,  n  system 
design  team  typically  consists  of  .1  very  small 
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croup  of  people  from  the  several  necessary  dis¬ 
ciplines.  Each  participant  must,  of  course,  be 
well  qualified  in  a  particular  discipline  and 
must  have  a  good  working  knowledge  in  the  other 
represented  areas.  This  cross-discipline  know¬ 
ledge  is  necessary  for  effective  contribution  to 
the  design  and  implementation  decisions. 

There  has  been  much  written  about  the  inte¬ 
grated  hardware/software  approach  to  systems  de¬ 
sign.  Experience  has  shown  that  it  is  not 
sufficient  to  collect  experienced  people  from 
the  contributing  disciplines.  As  Bobby  Creech 
observed  in  his  paper  on  the  B6500  architecture, 
the  attitude  and  the  personality  of  the  parti¬ 
cipants  are  critical  to  a  successful  system  de¬ 
sign.2  Intelligence,  common  sense,  and  previous 
experience  help  considerably,  but  the  successful 
blending  of  these  three  attributes  require  the 
correctness  of  the  contributors'  attitude  and 
personal  1 ty . 

Design  Scope 

Bob  Barton,  as  Indicated  in  his  1961  paper  on 
a  computer  system  design  approach,  suggests  that 
higher-level  programming  languages  should  be 
employed  for  all  programming  tasks  to  the  prac¬ 
tical  exclusion  of  lower-level  notations.! 
Additionally,  he  believed  that  the  operation  of 
the  computer  system  should  be  under  control  of 
the  system  itself.  This  injection  of  user  and 
operator  perspective  into  the  system  design 
process  implied  a  much  broader  utilization  of 
high-level  languages  than  had  been  considered 
in  prior  systems.  Contemporary  machines  of  that 
era  attempted  to  Implement  a  higher-level  lan¬ 
guage  in  the  hostile  environment  of  a  machine/ 
assembly  language  system.  To  provide  a  con¬ 
sistent  implementation,  the  design  team  on  the 
B5000  broadened  their  scope  of  responsibility 
to  Include  the  entire  programming  and  operation¬ 
al  environment  of  the  system. 

Early  in  the  higher-level  language  system  era 
at  Burroughs,  Lloyd  Turner  and  other  software 
team  members  developed  a  particularly  effective 
graphical  representation  of  the  ALGOL  language 
syntax. 2  This  representation  significantly 
clarified  the  language  structure  for  the  team 
and  permitted  new  insight  into  an  effective 
compiler  Implementation.  Additionally,  this 
representation  and  understanding  of  the  language 
permitted  the  definition  of  consistent  exten¬ 
sions  to  the  language  when  other  components  of 
systems  programming  and  operation  were  con¬ 
sidered.  The  entire  software  system  was 
implemented  in  ALGOL  (as  was  the  ALGOL  compiler 
itself).  Since  the  scope  of  the  systems  de¬ 
signers'  responsibility  ehcornpassed  the  entire 
hardware,  programming,  aiid  operational  environ¬ 
ment,  additional  opportunities  were  available 
for  the  partitioning  and  implementation  of 
required  functions,  Commonly  used  functions  as 
well  as  systems  management  algorithms  were 
factored  out  of  the  users  environment  into  the 
operating  system.  Where  appropriate,  these 
functions  were  replaced  in  the  users  environ- 
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merit  by  calling  (naming)  syntax  which  was  consis¬ 
tent  with  the  calling  language.  This  system¬ 
wide  approach  to  the  use  of  higher-level  lan¬ 
guages  provided  a  natural  environment  for  the 
handling  of  general  systems  functions.  These 
functions  were  represented  by  a  syntax  which 
was  consistent  with  that  utilized  for  the 
systems  software.  This  environment  permitted 
the  development  and  Integration  of  such  in¬ 
novations  as  automatic  memory  management, 
virtual  memory  and  general  file  management 
into  the  operating  system.  A  description  of 
the  results  of  this  pioneering  effort  is  In¬ 
cluded  in  the  B5500  Master  Control  Program  des¬ 
cription.15 

The  conmltment  and  the  adherence  to  the  ex¬ 
clusive  use  of  higher-level  languages  through¬ 
out.  the  system  produced  a  systems  software  and 
usage  base  which  could  be  readily  enhanced. 

The  interface  between  cooperating  software 
modules  implied  by  the  consistent  use  of 
higher-level  abstractions  permits  new  functions 
Lu  be  easily  Integrated  Into  the  software 
system.  This  abstraction  also  allows  software 
systems  to  he  propagated  over  several  gener¬ 
ations  of  hardware.  Software  subsystems,  such 
as  the  Network  Definition  Language,  A  the  Data 
Management  Languages,5  and  augmented  operation¬ 
al  dialogues  which  have  been  implemented  over 
the  post  several  years  have  been  guided  by  the 
global  perspective  suggested  by  Barton  and 
enhanced  by  subsequent  software  teams. 

General  Design  Principles 

The  preceedlng  discussion  suggests  that  the 
recognition  of  and  adherence  to  a  closely 
interrelated  set  of  sound  concepts  and  design 
principles  provides  far-reaching  benefits. 

Hi  is  conceptual  base  Is  required  to  be  succes- 
i'ul  in  the  typical  commercial  systems  environ¬ 
ment  of  evolution,  growth,  and  change.  In 
addition  to  the  concepts  and  Ideas  previously 
mentioned,  the  following  are  representative 
complementary  design  principles  which  have 
proven  successful  at  Burroughs. 

Recursive  Definition,  This  simple  approach 
can  Bo  employed  to  verify  the  consistency, 
completeness,  and  orderliness  of  a  defined 
object.  Several  current  notation  systems  per¬ 
mit  solution  definition  as  a  recursive  process. 

Minimal  Representation  of  Information.  Not 
all' Tifformatlon  has  the  same  Importance  when 
considered  in  a  language,  program,  or  system 
context.  The  use  of  a  higher-level  programming 
notation  wherein  Information  can  be  represented 
as  appropriate  to  its  static  and  dynamic  usage 
frequency  offers  some  Interesting  options  to  be 
exploited  by  system  Implementors.  As  an  example, 
Don  Knuth  has  reported  on  the  extremes  In 
i  OUTRAN  function  usage  In  that  operational  lan¬ 
guage  environment.1?  This  representational 
freedom  allows  for  significant  systems  perfor¬ 
mance  trade-offs  to  be  effected.  Wayne  Wilner, 


In  his  paper  on  B1700  memory  utilization,  pre¬ 
sents  some  interesting  observations  and  comments 
on  the  dramatic  effects  which  may  be  achieved 
through  optimal  information  representation. 19 

The  principle  of  minimally  representing  in¬ 
formation  Is  consistent  with  the  abstraction  of 
higher-level  languages.  In  natural  languages, 
also,  people  abstract  and  codify  high-usage  com¬ 
munication  sequences  for  efficiency  and  compre¬ 
hension. 

The  Importance  of  Information  Structures. 
Burroughs1  emphasis  on  the  efficient  "handling  of 
information  structures,  particularly  control 
structures,  has  provided  far-reaching  benefits. 
The  use  of  the  stack  in  our  machine  architectures 
for  the  partitioning  and  handling  of  subroutines, 
procedures,  and  processes  has  permitted  the 
practical  application  of  several  of  the  concepts 
and  Ideas  noted  In  this  paper.  Additional  ben¬ 
efits  of  the  use  of  the  stack  mechanism  include 
those  which  contribute  to  the  multiprogramming, 
multiprocessing,  Information  protection,  and 
control  distribution  facilities  of  typical 
Burroughs  systems. 

Abbreviated  History 

Observers  of  Burroughs  systems  developments 
have  detected  a  consistent  philosophy  regarding 
systems  appearance  from  the  perspective  of 
programmers  and  users.  These  observers  cor¬ 
rectly  concluded  that  the  primary  Impetus  for 
the  control  and  guidance  necessary  to  maintain 
this  image  is  largely  attributable  to  an  in¬ 
formal  and  long-standing  relationship  among  key 
Burroughs  technical  personnel.  This  group 
shares  both  a  personal  rapport  and  a  commitment 
to  a  set  of  system  design  and  use  concepts.  In 
informal  meetings  and  conversations,  Barton, 

Lloyd  Turner,  and  others  have  served  as  a 
catalyst  for  the  elaboration  of  the  original  and 
the  synthesis  of  new  Ideas  and  concepts.  With 
this  common  experience  as  a  basis,  it  Is  not 
surprising  that  there  are  repetitions  in  concept., 
approach,  and  appearance  within  the  several 
Burroughs  systems. 

Following  is  a  brief  discussion,  not  neces¬ 
sarily  In  chronological  order,  of  the  evolution 
of  some  attributes  of  higher-level  language 
oriented  systems  at  Burroughs.  Also  Included 
are  observations  on  some  of  the  reasons  for 
particular  developments  or  emphasis. 

The  BSOOD,  B6000,  B7000  Series 

In  the  late  1950s,  Burroughs  Implemented  an 
early  version  of  the  ALGOL  language  on  the 
Burroughs  B2Z0,  a  conventional  machine  of  that 
era.  This  implementation  served  to  prove 
several  of  Barton's  original  higher-level  lan¬ 
guage  machine  concepts.  It  provided  a  vehicle 
for  the  evaluation,  feedback,  and  refinement  of 
an  ALGOL  virtual  machine. 
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The  B50C0  system  ms  announced  In  1961.  The 
successor  B5500,  announced  In  1964.  Included  a 
large,  fast  secondary  storage  facility  and  a 
eore  comprehensive  operating  system.  Turther 
enhancements  were  announced  with  the  BS7u3  in 
1970. 

The  96500  system,  announced  In  1966,  Incor¬ 
porated  significant  enhancements  to  earlier 
machines  and  Integrated  many  new  Ideas  and 
Innovations.  The  B6000  system,  which  was 
announced  In  1976,  provided  a  more  effective 
leplenentutlon  of  the  KOOO  series  architec¬ 
ture.  V.  also  Incorporated  features  and 
functions  for  consistent  work  and  resource 
sharing  among  multiple  local  and/or  distri¬ 
buted  systems. 

The  17700  laroe-scale  system,  which  was 
Introduced  In  1970,  provided  both  source  and 
o*>ject- language  compatibility  with  the  B60Q0 
series  systems.  Additionally,  It  offered 
enhanced  performance.  Information  Integrity, 
and  distributed  Input-output  facilities.  The 
•7100,  Introduced  In  1977,  Is  e  higher  per¬ 
formance  version  of  the  B7000  series. 

Following  are  typical  of  the  Ideas  and 
concepts  of  the  B5000.  B6000,  and  87000 
systems: 

The  Stack.  Many  of  the  concepts  and  Ideas 
previously  noted  were  applied  In  the  design  of 
the  BSOQO  system.  One  of  the  more  Important 
Idaas  embodied  In  that  machine  was  the  In¬ 
tegration  of  the  stack  into  the  machine  archi¬ 
tecture.  The  stack  mechanism  Is  particularly 
affactlva  In  tha  ALGOL  language  handling  envi¬ 
ronment.  The  power  of  the  stack  lies  In  the 
control  mechanism  that  can  be  embedded  In  It 
end  Its  use  for  4yn<a1c  temporary  storage. 

This  facility  permits  efficient  evaluation  of 
arithmetic  oxprtsslons  and  storage  of  para¬ 
metric  and  control  Information  for  generellzed 
subroutine  and  procedure  handling.  It  also 
allows  an  effective  reduction  In  program  stor¬ 
age  requirements  since  the  top  of  the  stack 
provides  an  Implied  address  for  most  of  the 
order  codes  of  the  machine.  A  complete  des¬ 
cription  of  the  stack  and  other  features  of  the 
15000  and  Its  successor,  the  85500,  can  be 
found  In  the  Burroughs  Reference  Manual  on 
these  systems. 6. 7 


The  stack  Isolementatlon  on  the  B5000  and  : 
85500  was  anhanced  during  the  design  of  the 
86500.  An  evolution  of  the  B6500  stack  struc-  j 
ture  Is  employee  In  the  current  Burroughs  66000 
and  B7000  series.  Based  on  experience  with  the 
BS500,  the  addressing  mechanism  for  local  and 
global  variables  wit  more  consistently  developed, 
so  that  the  dynamic  addressing  environment  en¬ 
countered  In  the  execution  ot  program*  1* 
maintained  automatically  by  tho  stack  and  ro-  ! 
lated  structure*.  In  addition,  thn  concept  of  ! 
a  "cactus  suck"  was  Introduced  to  provide  a 
vehicle  for  the  more  ordarly  control  of  multi-  i 
programing  and  multtyrocosslng.  A  good  treat-  : 
eent  of  tho  use  of  tM  cactus  stack  In  process  ; 
handling  is  provldod'by  Jock  Cleary  In  Ms  paper! 
on  that  subject.” 

Tho  cactus  sUck  may  ba  viewed  at  a  tret  of  ■ 
sUcks  with  tho  trunk  containing  the  basic  oper¬ 
ating  system  process  reprosenUtlon.  Branches 
from  tho  trunk  conUln  control  and  parametric 
information  for  new  processes  as  they  arc 
created.  This  structure  differs  from  conven¬ 
tional  trees  In  that  tha  trunk  can  continue  to 
grow  afUr  branch**  havt  been  created.  One 
graphic  ropresanUtion  of  this  structure  re-  > 
s ambits  the  Saguaro  cactus  of  tho  southwest 
United  States— hence  the  "cectus  sUck"  design-  , 
atlon.  The  paper  by  Erv  Hauck  and  Bon  Dent 
furnished  an  excellent  discussion  of  thw  details i 
of  th*  B6500  stack.”  Details, may  be  found  In 
th*  Systems  Reference  Manual.1"  Elliott 
Organlck's  book  on  th*  B67O0  provides  a  good 
treatamnt  of  th*  cactus  sUck  In  tho  context  of  ; 
an  overall  system  description. w 

The  Descriptor.  Th#  descriptor  on  Burroughs'' 
systems  is  a  highly-encoded  sequence  of  program  ' 
which  Is  executed  when  It  Is  encountered  during  , 
accessing  of  Information.  Tho  descriptor  may  be 
regarded  as  a  generalized  form  of  control  word. 

It  Is  used  to  separate  those  functions  assocl- 
ated  with  tho  Information  definition  and  control 
from  procedural  coda.  This  separation  of  des¬ 
cription  end  function  facilitate*  the  handling 
of  date  and  program  while  maintaining  the  hioh- 
lovol  abstraction  of  tho  user  environment.  Good; 
detailed  descriptions  of  this  powerful  facility 
can  be  found  In  the  pepor  by  Hauck.and  Dont  and 
In  Organlck's  book  on  tho  B67007*»K' 
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A  major  objective  of  the  B2000,  B3000,  and 
84000  systems  design  was  a  family  of  systems 
which  would  be  efficient  at  character  handling. 
Specifically,  the  systems  were  to  provide  an 
effective  and  efficient  host  for  the  COBOL  pro¬ 
gram  environment  and  for  character-oriented 
peripherals  such  as  data  comminl cation  term¬ 
inals  and  magnetic  and  optically  encoded 
document  handlers. 

The  B2500/B3500  systems  were  introduced  in 
1966.  The  B2700/B3700/B4700  enhancements  to  the 
series  were  announced  in  1970  and  1971.  The 
B2800/Q3800/B4800  systems  which  provided  both 
higher  performance  and  machine- language  compa¬ 
tibility  with  earlier  systems  In  the  series, 
were  announced  in  1975  thru  1977.  Many  en¬ 
hancements  to  the  B2000  series  have  been 
Integrated  into  the  B2900  systems  which  were 
announced  in  1979. 

General  Architecture.  The  experience  base 
for  a  macKTne  "which  "could  perform  well  in  a 
character-oriented  environment  began  with  the 
11200  systems  of  the  early  1960s  and  included 
observations  and  experience  with  the  B5000  and 
135500  systems.  14 

The  processor  and  memory  of  the  B2000-B4000 
systems  are  oriented  toward  the  character, 
field,  and  record  requirements  of  the  COBOL  lan¬ 
guage.  The  Instruction  set  accommodates 
variable-length  strings  of  alphanumeric  and 
numeric  representations. 

Because  of  the  dominance  of  fleld-to-fleld 
operations  in  the  COBOL  operational  environ¬ 
ment,  the  processor  was  designed  to  utilize 
primarily  a  memory-to -memory  Instruction  im¬ 
plementation.  Since  the  processor  retained 
minimal  state  between  Instructions,  the  system 
could  quickly  respond  to  interrupts  from  the 
high  frequency  of  input/output  operations  in  a 
typical  data  processing  environment.  This  fast 
interrupt  response  facilitated  the  handling  of 
data  communications  requirements.  It  also  al¬ 
lowed  the  handling  of  the  real-time  functions 
of  high-volume  document  handling  peripherals 
in  a  multiprogramming  mix. 

The  machine  also  Incorporated  a  stack  mecha¬ 
nism  to  facilitate  the  handling  of  control  in 
the  COBOL  and  operating  system  environments 
Since  the  stack  was  mapped  Into  the  memory  area 
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for  each  program  or  process,  it  did  not  detract 
from  the  rapid  state-switching  requirements  ot 
the  system. 

EDIT  Instruction,  The  application  of  e*per'- 
once  ancTdbservatlons  for  development  anq 
Implementation  of  character  handling  language 
and  functions  Is  typified  by  the  B2000  series 
EDIT  Instruction. 

The  character  handling  facilities  of  the  B&OOL 
machine  and  the  necessary  primitives  to  accom¬ 
plish  the  COBOL -specified  MOVE  and  EDIT  functions 
were  not  well  designed  or  implemented  or  that 
machine.  COBOL  was  a  new  programring  language 
at  the  time  of  the  B5000  design.  There  was 
little  experience  with  the  practical  requirements 
of  that  language  environment.  Additional  in¬ 
formation  was  required  on  the  problem  of  mapping 
the  requirements  of  the  MOVE  and  EDIT  functions 
on  the  B5000.  The  compiler  group  developed  an 
enumeration  and  representation  of  the  functional 
requirements  defined  by  COBOL.  They  then  per¬ 
formed  a  simulation  of  the  virtual  machine 
implied  by  that  form  and  semantics.  This  expe¬ 
rience  and  the  resultant  insights  provided  a 
sufficient  basis  for  the  appropriate  generators 
in  the  COBOL  compiler  for  the  B50O0.  The  re¬ 
presentation,  algorithms,  and  techniques  devel¬ 
oped  for  the  B5000  compiler  were  supplemented  by 
the  results  of  observations  on  that  virtual 
machine.  This  experience  served  as  a  basis  for 
the  design  and  Implementation  of  the  M0VE/ED1T 
Instruction  on  the  B2000,  B3000,  B4000  systems. 

On  those  machines,  most  MOVE  verbs  in  COBOL  can 
be  performed  by  a  single  instruction. 

Details  of  the  structures  and  operations 
Implemented  on  this  family  of  systems  can  be 
found  In  the  Reference  Manual  for  those  systems. 
11 

The  B1000  Series 

The  current  Burroughs  B1000  series  (B1700, 
B1800,  B1900),  were  designed  to  support  a  multi¬ 
plicity  of  high-level  language  and  processing 
environments.  In  addition,  the  system  was  in¬ 
tended  to  support  the  emulation  of  several 
existing  and/or  proposed  machines. 

The  Initial  systems  of  the  B10Q0  series,  the 
B1700s,  were  announced  In  1972  and  1973.  The 
BIHOOs,  which  incorporated  significant  perfor- 
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nunc*  enhancements  were  Introduced  In  1976. 
Initial  61900  systems  were  announced  In  1979. 


and  storage  hardware  fetches  and  stores  one  or 
more  bits  from  any  location  with  equal  facility. 


Tha  Design.  Based  on  analysis  and  experience, 
the  design  learn  concluded  that  the  range  of  repre¬ 
sentations  and  functions  dictated  by  the  proposed 
set  of  programing  languages  and  Machines  could 
not  be  directly  accomodated  with  a  single,  com¬ 
mercially  viable  architecture.  A  sufficiently 
saall  set  of  structures  and  operators  could  not  be 
daflned  which  was  efficient  for  all  languages  and 
processing  environments.  A  machine  architecture 
was  Indicated  which  could  be  adapted  to  each  pro¬ 
cessing  and  language  requirement. 

The  61700  system  design  Included  an  attempt  to 
define  a  machine  which  had  no  inherent  structure 
and  no  a  pri — 1  Instructions.  To  satisfy  this  de¬ 
sign  objective,  a  passive  machine  was  required 
which  could  accoamdate  definable  information 
structures  and  Instructions. 

The  design  approach  used  on  the  B1700  system 
was  to  anticipate  a  unique  machine  architecture 
for  each  programing  language  and  emulation  envi¬ 
ronment.  The  designers  had  to  consider  both  the 
typical  high-level  forms  of  program  representation 
as  well  as  machine- language  forms  from  existing 
machines.  Restated,  the  B1700  design  objective 
was  to  efficiently  emulate  a  set  of  real  and 
virtual  machines. 

Var labl e-FI el d  Handl 1 ng .  The  ability  to  vary 
the- machine's  Image  for  each  emulation  environ¬ 
ment  Implies  some  very  specific  hardware  and  soft¬ 
ware  adaptations.  Fortunately,  our  experience  on 
several  prior  machine  designs  and  research  pro¬ 
jects  suggested  several  potential  solutions  to 
this  variable-environment  processing  problem. 

It  was  observed  that  data  and  program  are  fre¬ 
quently  not  suited  to  the  representation  imposed 
by  typical  word  or  character  organized  storage  and 
processing  elements.  The  actual  nature  of  program 
and  data  demands  variable  size  representation. 
Considering  the  range  of  storage  and  processing 
environments  of  the  B1700  system,  the  smallest 
unit  of  information,  the  bit,  must  be  addressable 
In  order  to  provide  complete  flexibility  in  the 
mapping  and  processing  solutions.  To  accommodate 
this  requirement,  the  B1700  system  was  designed 
with  a  defined-fleld  storage  capability.  In  this 
memory  system,  all  storage  Is  addressable  to  the 
bit,  all  field  lengths  are  expressable  to  the  bit. 


The  B1700  processor  was  designed  to  provide  an 
efficient  vehicle  for  the  emulation  of  multiple 
language  processing  environments.  The  Instruc¬ 
tion  set  of  the  machine  Included  primitives  from 
the  set  of  programming  language  and  emulation 
environments  as  well  as  those  which  contribute  to 
the  emulation,  or  Interpretation,  process  Itself. 
For  example,  the  Arithmetic-Logic  Unit  could  be 
parameterized  to  a  width  which  corresponds  to  the  , 
data  or  machine  being  handled.  A  good  exposition 
of  the  B1700  design  was  provided  by  Uayne  Wllner 
In  his  paper  on  that  subject  and  Is  detailed  In 
the  System  Reference  Manual. lz» 13  The  book  by 
Organlck  and  Hinds  contains  an  excellent  des¬ 
cription  of  the  B170O/B1BOO  systems  architecture 
and  application. ?0 

Language-Specific  Machines.  The  congruency  of  ■ 
the  functions  dictated  by  a  processing  environ¬ 
ment  and  the  repertotr  of  structures  and  opera¬ 
tors  supported  by  a  machine  generally  determines  , 
the  efficiency  of  a  system.  For  the  B1000 
systems,  an  "Ideal"  machine  was  designed  for  each 
processing  environment.  Where  an  existing 
machine  was  to  be  emulated,  the  form  and  semantics: 
of  that  machine  constituted  the  definition.  After 
the  machine  definition,  an  emulator,  or  Inter¬ 
preter,  was  developed  which  provided  the  semantic 
definition  of  that  virtual  machine.  Thus,  the 
compiler  writers  had  an  Ideal  machine  structure 
and  operator  set  for  their  object  code.  This 
repertolr  of  structures  and  operators  provided  an 
isomorphic  relationship  between  most  functions 
expressed  in  the  high-level  language  and  the 
target  machine. 

Optimization.  Since  the  virtual  machine  could 
be  adapted  to  each  processing  and  language  envi¬ 
ronment,  facilities  were  integrated  into  the  de¬ 
sign  to  optimize  the  adaptations.  Tools  and 
techniques  were  indicated  which  could  supplement 
our  perception  of  the  environment  with  empirical 
information. 

Both  hardware  and  software  facilities  were 
integrated  into  the  system  to  permit  static  and 
dynamic  observations  on  the  virtual  machine's 
representation  and  performance.  These  observa¬ 
tions  ware  utilized  to  extend  our  inowl edge  base 
on  these  language-specific  machines.  Virtual 
machtna  definition  and  representation  are  changed 
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as  Indicated  by  static  and  dynamic  observations  on 
the  machine's  behavior.  This  technique,  and  the 
adaptability  of  the  machine,  has  permitted  very 
effective  enhancement  and  optimization  efforts  to 

■bn  realized. 

It  should  be  noted  that  the  exclusive  use  of 
higher-level  languages  contributes  significantly 
to  the  success  of  the  optimization  efforts.  The 
use  of  abstract  programming  notations  provides  the 
hecessary  representational  freedom  to  effect  the 
indicated  virtual  machine  changes.  Some  addition¬ 
al  background  material  and  experience  with  the  ap¬ 
plication  of  the  systems  monitor  facility  Is 
provided  by  Russ  Hagen  In  his  paper  given  at  a 
, Computer  performance  seminar. 21  A  description  of 
the  supplemental  functions  provided  In  a  perfor¬ 
mance  measurement  subsystem  can  be  found  In  the 
iJSystom  Performance  Monitor  Reference  Manual. 22 

Resource  Management.  The  B1000  systems  support 
■the  concept  that  the  machine  should  manage  its  own 
■environment.  These  systems  Incorporate  the  stan¬ 
dard  Burroughs  set  of  operating  systems  scheduling 
acid  other  resource  management  facilities.  Program 
and  information  segments  are  handled  automatically 
for  both  Interpreter  and  virtual  machine  processes. 

At  a  typical  Installation,  several  language 
environments  may  be  concurrently  active  In  a  mix 
of  programs.  Through  appropriate  information 
Integrity  and  resource  management  mechanisms,  each 
user  views  the  system  as  a  dedicated  facility  de¬ 
signed  to  effectively  accommodate  his  particular 
language  environment. 

Summary. 

The  comprehensibility  of  communications  as  a 
result  of  the  exclusive  use  of  higher-level  no¬ 
tations  throughout  Burroughs  computer  systems  en¬ 
hances  their  role  In  human  communication.  The 
development  and  evolution  of  efficient  machine 
architectures  to  support  abstract  Information  re¬ 
presentations  makes  the  use  of  higher-level  lan¬ 
guages  effective  and  practical. 

Ac k  nowl edgement 

Many  people  have  contributed  to  the  set  of 
concepts,  Ideas,  and  design  principles  included 
In  this  paper.  Their  application  In  Burroughs  Is 
a  tribute  to  the  strong  commitment  and  persis¬ 
tence  of  Bob  Barton  and  the  B5000  team.  This 


group,  and  the  many  participants  In  Burroughs 
developments  over  the  past  20  years,  have  expan¬ 
ded  and  amplified  the  basic  set  of  Ideas. 

The  author  wishes  to  thank  John  McClIntock  and 
Barbara  Bennett  for  their  conscientious  criticism 
of  various  drafts  of  this  paper. 
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A  SURVEY  OF  HIGH-LEVEL  LANGUAGE  MACHINES  IN  JAPAN 


Masaluro  YAMAMOTO 


Nippon  Electric  Co.,  Ltd.,  Central  Research  Laboratories 
4-1-1  Miyazaki,  Takatsu-ku,  Kawasaki  213,  Japan 


Many  high-level  language  machines  in  Japan 
have  been  made  which  can  use  most  high-level  lan¬ 
guages.  Several  proposals  and  experiments  were 
performed  since  the  late  1960S  and  significant 
research  started  after  1975. 

Much  of  them  are  proposed  on  experimental 
machines.  There  are  a  few  commercial  high-level 
language  machines.  It  is  characteristic  that  much 
LISP  and  APL  machine  research  has  been  achieved  at 
Laboratories  and  Universities  and  a  few  FORTRAN 
and  COBOL  machines  have  been  made  by  computer  man¬ 
ufacturers. 

introduc'  ion 

This  survey  report  is  an  ovarvtew  of  the  ac¬ 
tivities  related  to  high-level  language  machines 
in  Japan.  Commercial,  experimental  and  proposed 
machines  are  covered.  More  space  la  devoted  to 
significant  characteristics  in  their  intermediate- 
language  architectures,  hardware  structures,  soft¬ 
ware/firmware/hardware  tradeoffs  and  evaluation 
data,  rather  than  their  detailed  architectures  and 


hardware  configurations  in  order  to  cover  moat 
high-level  language  machines.  For  easy  understand¬ 
ing  and  clarification  of  their  differences,  arch¬ 
itectural  comparisons  between  high-level  language 
machines  for  the  same  high-level  languages  are 
considered. 

High-level  language  machine  research  in  Japan 
has  bean  made  for  most  high-level  languages.  Much 
of  them,  however,  concentrate  on  experimental-level 
high-level  language  machines  and  thars  era  only  a 
few  commercial -level  high-level  language  machines. 
Several  proposals  and  experiments  were  mads  in  the 
end  of  1960S  and  early  1970S.  significant  research 
efforts  have  started  After  acme  1975,  as  shown  in 
Fig.  1. 

Generally  speaking,  it  is  characteristics  that 
much  research  data  have  been  gathered  on  LISP  and 
APL  machines  at  Laboratories  and  Universities  and 
a  few  FORTRAN  and  COBOL  machines  have  been  made  by 
computer  manufacturers. 

References  ere  listed  at  the  end  of  this  re¬ 
port  in  which  the  reader  can  find  detailed  infor¬ 
mation,  Unfortunately,  most  of  them  are  written 
in  Japanese, 
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Hlqh-Lsvsl  Utgutg  Hachinsa 
PL/I  processors 

PL/ 1  i«  the  Boat  ooeplex  commercial  high-level 
language.  Renee  it  ia  time-con  Biasing  to  manipulate 
on  a  conventional  computer.  Therefore,  the  appaar- 
anoe  of  edvamosd  and  conaiatant  PL/X  processors  h*u 
been  desired  for  quite  a  vhile. 

the  firat  significant  atop  in  the  reaaarch  on 
high-level  langivage  aaahinea  in  Japan  occurred  with 
the  proposal  for  a  PL/X  prooeaaor  by  M.  Sugimoto. 

In  1969,  ha  proposed  a  PL/I  processor1  ooepoead  of 
a  tranalator,  called  the  PL/X  reducer,  and  a  hard¬ 
ware  interpreter,  called  the  direct  proceaeor. 
the  PL/X  reducer  translates  a  PL/I  program  into  a 
list-structured  intermediate  language,  DIPL  (Di¬ 
rect  Proceaeor  Input  Language) ,  that  consists  of 
four  parts,  Prograai  Structure  List  (PSL) ,  Statement 
Ho  real  Pore  List  (8CTL) ,  Attribute  List  (al)  and 
Constant  List  (CL).  Iha  direct  processor  consists 
of  several  functionally  autonomous  unite,  ae  shown 
In  Pig.  2. 

The  PL/X  reducer  has  bean  ieplaeented.  For 
typical  scientific  prograne,  the  object  code  length 
hea  been  reduced  by  a  factor  of  25%  on  the  averago, 
coepared  to  that  of  tha  object  coda  generated  by 
the  PL/I  ccaqpiler  available  at  that  time.  Accord- 
ing  to  tha  tiadng  simulation  prograe  for  tha  direct 
proceaeor,  it  was  shown  that  28%  spaed  gain  over 
the  conventional  computing  systea  can  be  obtained 
for  arltheetia/atring  operations. 
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Fig.  2  Block  Diagrae  of  tha  Direct  Proceiaor 


FORTRAN  Ptoceaeore 


FORTRAN  or  array  processors  ara  only  used  as 
a  coessercial  high-level  language  nachina  In  Japan. 
Soae  of  than  have  actually  bean  used  as  an  attached 
or  integrated  processor  in  a  conventional  ganaral 
purpose  coeputer  syatee  for  performance  enhancement 
of  FORTRAN  prograe  execution.  Also,  in  accordance 
with  recent  urgent  requirements  for  effective  exe¬ 
cution  of  large  acala  scientific  applications,  more 
powerful  array  processors  have  been  planned. 


In  1973,  S.  Takahaahi  et  al.  at  Hitachi  Ltd. 
reported  results  of  fixtdamental,  sniper  1  mental  re¬ 
search  efforts  on  a  firmware  FORTRAN  proceaeor2, 
where  FORTRAN  source  statements  ara  translated  Into 
both  reverse  polish  and  adxed  reverse  polish  inter¬ 
mediate  texts,  in  mimed  reverse  polish,  arithmetic 
statements  are  translated  into  reverse  polish  texts 
and  IP  ntatemmnta  are  translated  into  normal  polish 
texts,  except  for  arithmetic  expressions  in  them. 

The  authors  concluded  that  the  execution  time 
ratio  for  reverse  polish  and  mimed  reverse  polish 
built  in  microprograms,  reverse  polish  in  software 
and  abject  machine  oodea  is  O.S  t  1.3  i  9.7  i  1,  bemad 
on  a  FORTRAN  dynamic  statement  mix.  On  tha  other 
hand,  the  object  memory  capacity  ratio  is  0.53  i 
0.58  t  0.58  :  1,  baaed  on  a  FORTRAN  static  statement 
mix. 

The  rACOM  230-75  APU  (Array  Prooeaaor  (hit)3'4 
from  Fujitsu  Ltd.  ia  a  pipelined  vector  machine 
attached  to  a  FACOK  230-7S  system  in  which  the  API! 
and  CPU  (Central  Processor  ttilt)  share  tha  main 
memory  (Fig.  3) .  The  APU  machine  structure  ia 
characterised  by  various  kinds  of  internal  regis¬ 
ters  (vector  registers,  data  registers  and  base 
raglatara) ,  vector  descriptors  and  powerful  vector 
instructions  for  array  or  vector  operations.  A 
FORTRAN  user's  program  is  written  in  AP-rORTMAM 
which  ia  an  extension  of  standard  FORTRAN  to  in¬ 
clude  vector  functions.  It  waa  indicated  that  tha 
maximum  APU  performance  is  22  Nags  Floating-Point - 
Operations  and  tha  APU  system  performance  of  vari¬ 
ous  application  programs  written  in  AP-PORTRAM  ie 
4-20  times  that  for  oor responding  CPU  prog r saw. 

An  APU  system  was  installed  in  Japan's  National 
Aerospace  Laboratory. 

Tha  IBM  System/ 363  -  2938  AP  and  tha  PAOON 
230-75  APU  ara  an  attached  proceaeor  to  the  central 
processor  through  an  I/O  channel  or  a  shared  main 
memory.  In  order  to  solve  problems ,  wherein  a  large 
amount  of  hardware  waa  necessary  and  that  a  special 
description  using  non-standard  FORTRAN  would  be  re¬ 
quired,  Hitachi  Ltd.  developed  tha  N-180  XAP  (Inte¬ 
grated  Array  Prooeaaor)5  where  array  processing 
functions  ara  Included  within  a  central  processing 
unit  as  a  ganaral  instruction  set  (vector  instruc¬ 
tions).  A  concise  vector  instruction  sst,  consist¬ 
ing  of  28  instructions,  use  selected  based  on  an 
analysis  of  ths  statistics  on  tha  behaviour  of 
FORTRAN  programs,  obtained  using  a  software  tool, 
FORMAT  e.  in  the  N-180  IAP,  FORTRAN  user's  programs 
written  in  standard  FORTRAN  ara  vectorised  through 
ths  vactoriring  FORTRAN  compiler6.  It  was  shown 
that  about  50%  of  the  benchmark  programs  using  exe¬ 
cution  steps  can  be  vectorised  by  28  vector  Instruc¬ 
tions. 
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Fig.  3  The  FAC0M  230-75  APU  System  Configuration 
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DAS IC  Machines 

In  1974,  Y.  Nagai,  M.  Yamamoto  at  al.  of  NEC 
Ltd.  quantitatively  analysed  software/f irmware/ 
hardware  tradeoffs  In  a  BASIC  interpreter.  For  this 
purpose,  three  kinds  of  high-level  language  ma¬ 
chines,  a  software-implemented  BASIC  interpreter 
(s-rasic) ,  a  firmware-implemented  interpreter  (F- 
I3ASIC)  and  a  firmware  implemented  interpreter  with 
additional  hardware  (H-BASIC) ,  were  implemented . 
F-BASIC7  is  implemented  with  firmware  on  the  Centr¬ 
al  Purpose  Microprogrammed  Simulator  (OPHS)*1.  To 
reinforce  the  F-BASIC  performance,  hardware  func¬ 
tions,  such  aa  tranafnr/pointer  operations,  associ¬ 
ative  functions  and  so  on,  were  introduced  Into  the 
H-BASIC®  on  the  microinstruction  level.  Each  BASIC 
processor  translate*  a  BASIC  program  Into  a  same 
intermediate  language,  and  then  interprets  It. 
Experimental  results®  show  that  17  times  perform¬ 
ance  improvement  is  obtained  by  adopting  firmware. 

:i.G  times  more  performance  improvement  was  obtained 
by  introducing  appropriate  hardware  functions.  The 
memory  capacity  necessary  for  a  language  processor 
was  also  reduced. 

M.  Yamamoto,  an  implementor  of  the  preceding 
experiment,  proposed  an  advanced  high-level  lan¬ 
guage  architecture10  for  a  BASIC  machine  as  an  ex- 
tenstion  of  ihr.  above  three  BASIC  interpreters  in 
1975.  The  BASIC  machine  is  capable  of  both  trans¬ 
lation  and  interpretation  of  a  BASIC  program  and  is 
characterised  by  a  tagged  architecture .  •  a  large 
number  of  general  purpose  registers  end  powerful 
machine  instructions.  In  addition,  bit-handling, 
masking  and  table-pointer  operations  are  also  in¬ 
stalled.  It  was  estimated  that  tha  BASIC  machine 
performance  is  about  2  tines  thet  of  r-BASIC. 

T.  Maruyama  of  tiimeji  Institute  of  Technology 
made  a  BASIC  interpreter11! 12  on  e  general  purpose 
minicomputer,  HF-21MX,  Using  a  software  translator, 
BASIC  programs  are  translated  into  intermediate 
languages,  whiah  are  interpreted  by  a  firmware  in¬ 
terpreter.  In  the  interpreter,  conaonly  usable 
functional  routines  for  such  as  table  pointar/entry 
manipulations,  dot*  conversions  and  arithmetic  oper¬ 
ations,  rather  than  for  the  whole  of  a  special 
statement,  are  implemented  with  microprogram  tech¬ 
niques,  based  on  execution  frequency  evaluation 
data.  The  microprogram  amount  is  about  1.3k  words. 
A  firmware  BASIC  interpreter  is  about  4  to  9  times 
faster  than  a  software  version  on  benchmark  test 
programs. 

COBOL  Machine 

COBOL  is  the  most  commonly  used  commercial 
programming  language.  It  is  used  for  some  70%  of 
all  programming.  Therefore,  hitherto,  conventional 
computers  with  specialized  functions  or  architec¬ 
ture  for  COBOL  and  COBOL  maahines  appeared  at  the 
rnutnercial  level  overseas. 

on  the  other  hand,  in  Japan  an  ckperimental 
COBOL  machine11  similar  to  NCR  COBOL  Virtual  Machine 
has  been  put  into  implementation  since  1975  In  NEC 
Ltd.  The  COBOL  machine  architecture,  called  COMBAT 
(cobol  Oriented  Machine  Basic  Architecture) ,  has 
many  facilities  for  efficient  COBOL  program  execu¬ 
tion,  e.g.  many  internal  data,  data  descriptors  and 
intensive  COBOL  function  capabilities.  Th-.  COBOL 
machine  hardware  is  functionally  composed  of  three 


processor  modules  for  instruction  fetch,  operand 
fetch  and  instruction  execution  as  shown  in  Fig.  4. 
It  was  indicated  that  the  COBOL  machine  execution 
time14’16  is  about  3-5  times  faster  than  that  in  a 
medium  scale  conventional  computer.  Tha  COBOL  ma¬ 
chine  is  running  as  a  processor  attached  to  the 
conventional  commercial  computer. 


AC:  Advance  Controller 
ILF:  Intermediate 
Language  File 
FIFO:  First  In  Firat 
Out  Memory 

IFPM:  Instruction  Fetch  Processor  Module 
OFPM:  Operand  Fa ten  Processor  Module 
EXPM:  Instruction  Execution  Processor 
Module 

NCPM:  Memory  Control  Processor  Module 
Wt:  Mein  ‘flmny 

Fig.  4  COBOL  Machine  configuration 
LISP  M:  chine » 


Thu  mos ;  researched  high-level  language  machine 
in  Japan  <s  i  LISP  machine.  Since  the  LISP  language 
has  many  ino  sive  characteristics,  e.g.  dynamic 
data  allocation,  recursive  function  call  and  list 
porceaair.q,  it  is  impossible  to  effectively  execute 
LISP  programs  on  conventional  computers.  Increase 
in  research  areas  for  symbol  manipulation  and  advent 
of  low  cost,  highly  functional  and  easily  usable 
microprocessors  have  been  accelerating  tha  demand 
for  LISP  machines  since  1970  in  Japan. 

An  early  experiment  on  a  LISP  suichine  was  made 
by  T.  Shimada  et  al.  of  Elactotechnical  Laboratory 
(ETI.)  In  1974.  LISP  machine  research  in  ETL  nas 
been  performed  in  three  steps.  The  first  sxperiment 
involves  a  microprogrammed  LISP  interpreter16’17  on 
a  user  mic reprogrammable  computer,  HF-21KX. 

A  Babrow  stack  model  is  implemented  with  micropro¬ 
gram  techniques,  on  which  LISP  interpreter  is  made 
with  LISP  oriented  highly  efficient  instructions. 
Also  backtracking  end  coroutine  functions  axe  adopt¬ 
ed.  It  was  concluded  that  about  5  to  6  times  fester 
then  HP-2100  machine  instruction  codes  is  attained. 
Koraover,  such  basic  evaluation  date  about  micro- 
prograanad  LISP  interpreter  were  obtained.  It  is 
shown  that  highly  efficient  decision  making  includ¬ 
ing  multi-path  jump,  recursive  cell  et  the  micro¬ 
program  control  level,  bit  manipulation  end  main 
memory  control  are  effective  for  a  LISP  interpreter. 
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R«»«d  on  thocu  evaluation  dut-i  and  experience, 
new  LI»P  machine  (ETL  LISP  B)1(,wav  implemented  on 
«  universal  emulation  machine ,  ACC  (Adaptive  com¬ 
puting  Element)*-.  Internal  data  fo nn*  and  inter¬ 
preter  structure  for  thia  lisp  machine  are  identi¬ 
cal  to  the  HP-31MX  veraion .  In  order  to  attain 
batter  performance,  however,  all  the  interpreter 
is  written  in  microprogram,  and  stack  configuration, 
hardware  r agister  utilisation  and  memory  management 
arc  improved  due  to  using  advanced  AC 2  hardware 
facilities. 

In  addition,  virtual  LISP  machine19  is  beinc 
implemented  on  a  powerful  16-bit  microcomputer , 
whose  conceptual  structure  is  shown  in  rig.  6. 

In  the  virtual  LISP  etauH.no,  intermediate  language 
luutruction*  directly  corresponding  to  LISP  func¬ 
tions  are  considers*. 


luct  vwMtteM 


Tig.  5  conceptual  Structure  of  the  Virtual  LISP 
Machine 


LV4F  machine  MK320'21  of  Kyoto  University  Is 
based  on  a  LISP  oriented  special  processor,  which 
is  ?2-bit  data  length,  42-bit  microinstruction 
length  and  64-bit  list-cell  lonqth.  Also,  it  hat 
•pecial  hardware  units,  such  as  a  transfer  table 
for  generating  microinstruction  branch  addresses  to 
aid  checking  for  tag  field  and  data  category  and  a 
hardware  stack,  whose  top  areas  are  always  stored 
in  a  fast  buffer  memory.  NK1  has  about  100  macro- 
instructions  minly  for  stack  and  tag  manipulation, 
in  order  to  effectively  execute  LISP  functions. 

TV.e  processing  speed  of  a  LISP  interpreter  on  NK1 
is  5-6  times  that  of  a  LISP  system  on  a  genera  I 
puriosv  minicoeputtur .  rlyurc  6  shown  an  NK1 
Llnckdiaqra*. 


rig.  6  Slock  Diagram  of  LISP  Machine  NK1 


Research  on  LISP  machines  in  Japan  was  pro¬ 
moted  by  the  advent  of  low-cost,  high-performance 
and  easily  usable  microprocessors,  specially  bit 
or  byte  slice  microprocessors. 

K.  Taki  at  el.  at  Kobe  University  developed  s 
LISP  processor2^  23,  organised  with  4-bit  slice 
microprocessors  (Am  2900  series),  which  has  16-bit 
data  length,  56-bit  microinstruction  length  end 
32-blt  list-cell  length.  It  also  has  special  hard¬ 
ware  components  characterised  by  s  16-bit  4-k  word 
hardware  stack,  a  field  extractor  for  data  masking 
and  shifting,  a  3-bit  1  k  word  mapping  memory  gen- 
orating  «  3-bit  usage  code  corresponding  to  the 
main  memory  address  and  a  1-bit  64  k  word  bit-table 
supporting  qarbage  collection  function.  Figure  7 
shows  the  hardware  structure  for  the  Kobe  University 
LISP  machine,  which  is  connected  to  a  general  pur¬ 
pose  computer,  PAC0H  230-38,  through  an  8080  micro¬ 
computer.  A  DEC  LSI-11  minicoaputar  performs  ini¬ 
tiation  and  maintenance  functions,  LI8P  program 
loading  and  input/output  operations. 


SUP  . . . 

Fly.  7  Hardware  Configuration  of  A  LISP  Machine 
System 


T.  Usuki  at  el.,  from  Ralo  University,  imple¬ 
mented  s  LISP  machine24'25  on  a  multi -microproces¬ 
sor  system,  which  is  composed  of  an  interpreter 
processor  (IP) ,  a  storage  management  processor  (BMP) 
end  an  input -output  prooeetor  (X0P).  IP  performs 
overall  control  of  LISP  program  processing  and  LISP 
program's  interpretation,  and  has  a  16-level  hard¬ 
ware  stack  for  sequence  control  and  list  manipula¬ 
tion  capabilities.  Garbage  collection  end  cons, 
RFLACE  end  RPLACD  function  execution  are  achieved 
independently  of  interpretation  on  BMP,  which  is 
organised  of  byte-slice  microporcassors,  32  special 
registers  and  a  writable  control  storage.  Garbage 
collection  function  is  attained  based  on  DijXetra's 
algorithm,  ion,  a  general  purpose  miniccaiputer 
(NOVA) ,  accomplishes  input  operation  of  a  LISP  S- 
expression,  conversion  from  it  to  internal  forma 
and  file  processing.  Figure  8  shows  the  configura¬ 
tion  of  an  experimental  multiprocessor  systaai. 
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H.  Vasui  et  al.  of  Osaka  Univorsity  have  buon 
di.-v,?  I  oping  a  new  multiprocessor  LISP  machine,  EVLIS 
;n.v:hiiu>2<’i  27 .  In  a  traditional  multiprocessor  I.ISP 
n.i'-hiini,  list  processing  and  garbage  collection  or 
I/O  processing  are  performed  in  a  parallel  mode, 
on  tho  other  hand,  in  EVLIS  machine,  each  argument 
fur  a  LISP  function,  EVLIS,  is  parallelly  evaluated 
on  multiple  processors.  It  is  based  on  the  concept 
that  parallel  interpretation  of  EVLIS  arguments  is 
possible  if  an  argument  evaluation  does  not  affect 
the  other  argument  because  of  its  list  alteration 
opmation.  Figure  g  shows  the  system  configuration 
of  the  EVLIS  machine,  in  which  an  evaluation  proc¬ 
essor  nan  accomplish  an  argument  interpretation. 

An  evaluation  processor  is  organized  of  Intel  bit- 
1,)  ice  microprocessors,  I  3000  seric.  and  is  20- 
ln  t.  data  length  and  50-bit  microinstruction  length. 
A  lo-bit  list  ceil  can  be  brough  into  a  CAK-CDR 
i  c.  i  niter  from  a  main  memory.  When  there  in  gar- 
liage  collection  function  requirement,  all  ovaluu- 
i run  processors  stop  interpreting  EVLIS  arguments 
and  parallelly  perform  their  function.  A  simula¬ 
tion  result  related  to  the  performance  enhancement 
due  to  multi  processors  was  shown  in  the  paper21. 

Typical  LISP  machines  have  boon  surveyed. 
ial.de  1  shows  a  summary  of  their  major  character¬ 
istics.  In  addition,  there  are  other  research  ef¬ 
forts  related  to  LISP  machines.  ALPS/I  (Aoyama 
hist  Processing  Systom/I)28  Is  a  compact,  low-cost 


LISP  machine  on  a  universal  8-bit  microprocessoi 
(t  L.  Coto,  T.  Ida  el  al.,  of  the  Insti¬ 

tute  of  Physical  and  Chemical  Research,  are  design¬ 
ing  a  machine  for  numerical,  symbolic  and  associa¬ 
tive  computing,  FLATS  (Fortran  and  Lisp  machine 
with  Associative  features  for  Tuples  and  Sets) 28. 

In  FLATS,  overflow  free  and  variable  precision 
arithmetic,  tabic  look-up  computation,  and  associa¬ 
tive  computation  arc  realized  by  hashing  hardware, 
tag  mechanism  and  hardware  list  processing. 
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rig.  9  bystem  Configuration  of  EVLIS  Mach  ini 


Table  1  Architectural  Comparison  hi* tween  LISP  Mar-hiuus 
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APL  Interpreters 

APL  has  many  features  to  be  implemented  by 
fi rmwaro /hardware  techniques,  some  of  which  arc  (1) 
dynastic  data  and  dissension  attributes  associated 
with  variables,  (2)  various  operators  to  be  applied 
to  vector  and  array  operands,  and  (3)  a  large  num¬ 
ber  of  nonstandard  operators.  Moreover,  because 
APL  allows  dynastic  data  handling  and  because  it  is 
an  interactive  language ,  data  type  checking,  sub¬ 
script  checking  and  text  editing  arc  to  be  perform¬ 
ed  at  execution  tiate. 

In  order  to  overooaie  inefficiency  in  APL 
software  interpreter  due  to  these  features,  soate 
microprogrammed  APL  interpreters,  similar  to  IBM 
Hassitt's  machine,  are  experiawitally  implemented 
on  a  microprograamted  computer  since  1975  in  Japan . 
Various  quantitative  evaluation  data  about  firm¬ 
ware  effectiveness  in  an  APL  interpreter  were 
acc umulated. 

In  1975,  an  early  experiswnt  on  a  firmware 
APL  computer &  was  mad*  by  T.  Motooka  ct  ai.  at 
Tokyo  University  on  an  experiaantal  machine,  PPSl43. 
An  APL  source  text  is  translated  into  an  interme¬ 
diate  language  on  a  one  for  one  basis  by  a  lexical 
analyser  written  in  a  microprogram.  An  intermedi¬ 
ate  language  is  composed  of  identifiers,  operators, 
constants  and  brackets.  The  order  of  elements  for 
e  statsmsnt  is  seam  in  the  internal  representation. 
The  interpreter  is  written  in  microprograms  and 
APL*.  Both  the  lexical  analyser  and  the  Interpret¬ 
er  are  implemented  on  a  microprogrammed  experimen¬ 
tal  computer,  PPSl.  The  authors  concluded  that  the 
firmware  APL  computer  is  much  slower  than  an  APL 
machine  in  software  on  scalar  operations,  but  fast¬ 
er  on  many  vector  operations . 

H.  Miyawaki  at  ml.  of  Himeji  institute  of 
Technology  made  a  firmware  APL  interpreter31'31, 
based  on  e  quantitative  analysis32  of  the  inter¬ 
pretation  part,  which  <a  implemented  in  software 
on  a  general  purpose  minicomputer,  HlTAC-10.  An 
APL  source  statement  is  translated  into  an  inter¬ 
mediate  text  which  is  composed  of  32-bit  text  ele¬ 
ments  followed  by  an  end  element  an  shown  in  Pig. 

10.  It  was  indicated  that,  in  a  f irmwarixntion, 
appropriate  functional  modules,  frequently  used  to 
implement  an  APL  interpreter,  are  to  be  selected 
tether  than  all  of  an  APL  statement.  As  a  result 
of  this  experiment,  it  is  shown  that  a  firmware 
interpreter,  made  of  about  4.8-k  words  micropro¬ 
grams  is  6  times  faeter  than  a  software  version. 
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Pig.  10  Source  Statement,  intermediate  Text 
and  Its  Eleewmt 


Y.  Horimoto  from  Toshiba  Ltd.  implemented  a 
firmware  APL  interpreter,  APL/ EPOS  I  interpreter3*'3*, 
on  an  EPOS  (Experimental  Polyprocessor  System)  sys- 
tem44,  whose  component  processor  is  organised  of  e 
universal  host  microprocessor,  PULCB  (A  high  per¬ 
formance  universal  computing  clamant) 45,  dedicated 
to  emulation  with  powerful  microinstruction  seta, 
various  kinds  of  hardwsre  registers  and  SO  on. 

APL  source  statements  ore  translated  into  interme¬ 
diate  texts  similar  to  the  preceding  firmware  APL 
interpreter  by  a  translator  written  in  pseudo  APL 
language  (PAPL) ,  whioh  is  emulated  with  mlcropro- 
qraaa.  cn  the  other  hand,  intermediate  texts  era 
interpreted  by  PAPL  and  microprograms,  and  micro¬ 
programs  mainly  play  (canning  tor  intermediate 
texts,  decision  on  operation  category  to  be  mani¬ 
pulated  and  execution  of  basic  APL  operators. 
According  to  evaluation  data,  API/EPOS  I  interpret¬ 
er  is  100  times  fsstsr  then  e  software  version,  on 
some  APL  functions.  Also,  it  ia  faster  than  the 
execution  of  object  codes  generated  by  e  compiler. 

Moreover,  Mother  similar  research  effort3* 
has  bean  carried  out  on  a  dynamic  wlcroprogr tamable 
computer,  QA-146,  by  Kinoshit*  at  al.  of  Kyoto 
University.  Various  unique  experJUaantal  results 
will  be  obtained  because  of  many  special  QA-1  fea¬ 
tures,  e.g.  herdware  stacks,  low-level  parallel 
processing  capabilities  due  to  using  four  ALUs  and 
tag  minipulation  functions. 

PASCAL  Machine 

The  use  of  a  structured  high-level  language, 
PASCAL,  it  increasing  due  to  its  high  portability, 
programme r/execution -efficiency  and  ooapaotnasa  of 
language  processing  syetea.  At  the  earns  time,  in 
order  to  effectively  execute  PASCAL  programs,  PASCAL 
machine*,  such  as  PASCAL  Nicroengine.  of  Western 
Digital  Corp.,  have  appeared. 

T.  Fumy*  of  ETL  experimentally  implemented  * 
concurrent  Pascal  Machine37  on  the  multiprocessor 
Bystem  (ACE)42,  baaed  on  P.B.  Hansen1*  Concurrent 
Pascal  Machine.  An  interpreter  to  execute  Concur¬ 
rent  Pascal  Machine  (CPM)  instructions  and  a  Kernel 
to  supervise  parallel  processes  were  made  with  both 
PDP-11/45  instructions  and  CPH  oriented  language 
(C-language)  which  were  emulated  with  AC*  system 
microprograms.  C-language  consists  of  conventional 
machine  instructions  like  PDPU/45  and  frequently 
used  cpm  instructions.  In  order  to  parallelly  exe¬ 
cute  multiple  processes  on  a  multiprocessor  system, 
process  synchronisation  instructions  and  I/O  opera¬ 
tions,  having  a  process . schedule  function,  are  in¬ 
troduced  to  the  Kernel  with  the  aid  of  an  ACE  syn¬ 
chronisation  module;  As  a  result  of  the  experiment, 
various  valuable  evaluation  data  ware  shown,  and 
groat  decrease  in  overhead  time  was  attained  by 
parallel  execution  of  processes  and  efficient  proc¬ 
ess  switching. 

Other  Research  Efforts  on  High-level  Language 
Machine  Design  Problems 


In  addition  to  high-level  language  machine  im¬ 
plementation  efforts  described  earlier,  a  nimbar  of 
other  research  efforts  related  to  high-level  lan¬ 
guage  machine  design  problems  have  been  made.  The 
intermediate  language  architecture  of  a  high-level 
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1  nnguage  machine  is  one  of  major  Keys  for  success¬ 
ful  Implementation.  Some  evaluations38'59  on  this 
problem  were  accomplished.  Moreover,  the  problem 
of  a  multilingual  high-level  language  machine  was 
considered'*13. 


Sanaa  rv 

High-level  language  machines  .in  Japan  wore  sur¬ 
veyed.  Generali v  speaking,  much  of  them  arc  at  the 
stage  of  fundamental  and  experimental  research  com¬ 
pletion.  In  the  future,  the  appearance  of  regular 
commercial  high-level  language  machines  and  the 
confirmation  of  their  effectiveness  will  be  desired. 
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ABSTRACT 

l"he  SYMBOL  system  is  the  prime  example  of 
the  actual  construction  and  use  of  a  high  level 
language  computer.  It  is  unique  in  the  architecture, 
the  instruction  set,  and  the  language.  This  paper 
attempts  to  summarize  some  of  the  lessons  learned 
from  the  machine  during  the  last  eight  years  of  its 
use.  Comments  are  made  on  the  high  level  instruc¬ 
tion  set,  and  how  the  descriptor  and  tag  mechanisms 
aflccted  the  system.  Several  of  the  processors  are  dis¬ 
cussed,  including  the  automatic  memory  management 
and  the  hardware  implemented  operating  system. 

The  difficulties  encountered  in  debugging  the 
haidvvare  anti  the  software  are  compared 

Introduction 

One  of  the  most  radical  computer  architectures  of  the  last  decade 
was  unveiled  in  1971  with  the  announcement  of  the  SYMBOL1  1  com¬ 
puter  system  The  prime  goal  of  the  SYMBOL  research  project  was  to 
demonstrate  with  a  full-scale  working  computer  that  a  procedural 
general-purpose  programming  language  and  a  large  portion  of  a  time- 
shared  operating  system  could  be  implemented  directly  in  hardware, 
resulting  in  a  marked  improvement  in  computational  rates.'  A  further 
goal  was  to  show  that  such  a  task  could  be  mounted  by  a  relatively 
.-.mall  group  ol  people  in  a  reasonable  amount  of  lime  through  the  use 
of  appropriate  design  tools  and  construction  techniques.  The 
announcement  and  initial  papers  on  this  computer  system  were  made 
at  a  time  when  it  was  not  yet  fully  operational,  and  was  being  moved 
to  Iowa  State  University  for  final  debugging,  evaluation  and  use. 
Alter  arrival  at  ISU  the  computer  was  made  fully  operational,  and  was 
used  in  a  programming  environment. 

It  would  be  nice  if  a  definitive  statement  could  be  made  neatly 
categorizing  all  of  the  successes  and  failures  of  the  project. 
Unfortunately,  such  data  was  remarkably  difficult  to  collect,  project 
members  still  disagree  on  many  issues.  Part  of  the  problem  in  evaluat¬ 
ing  SYMBOL  was  that  the  machine  was  radically  different  from  trade 
tional  computers  in  so  many  way*  that  u  controlled  comparison  was 
practically  infeasible.  Neverthelesa.  we  feci  it  is  important  to  state  our 
opinions;  it  should  be  understood  that  the  following  comments  urc  per¬ 
sonal  observations  by  the  authors,  based  upon  four  years  of  daily  con¬ 
tact  with  the  SYMBOL  machine.  In  defense  ol  (he  original  designers 
of  the  machine,  we  feel  it  necessary  to  reiterate  that  SYMBOL  was 
intended  as  a  learning  device,  rather  than  as  a  commercially  viable 
(iroduct. 

'Wart  »t  torn  Slate  University  under  NSF  |rant  OJDOV7X 


Background 

The  roots  of  SYMBOL  go  back  as  far  as  1964,  when  it  was 
decided  by  a  group  of  engineers  at  Fairchild's  research  facility  in  Palo 
Alto.  California  that  the  future  of  integrated  circuit  technology  dic¬ 
tated  the  use  of  hardware  for  traditional  software  functions  The 
design  of  the  system  was,  and  still  is,  a  unique  example  of  a  completely 
top  down  design.  It  was  felt  that  existing  programming  languages  had 
been  influenced  too  heavily  by  the  underlying  hardware,  and  that  valu¬ 
able  piogrammer  lime  was  unnecessarily  being  spent  performing  func¬ 
tions  such  as  memory  managemenl  because  of  unreasonable  computer 
architectures.  A  high  level  language  computer  was  oeen  as  an  answer 
to  reducing  rising  software  costs. 

One  of  the  first  tasks  tackled  was  the  specification  of  a  new  pro¬ 
gramming  languagefSPL)4  5  along  the  lines  of  ALGOL  60  and  PL/1, 
but  without  underlying  machine  influences.  The  language  was 
designed  for  processing  character  oriented  data  that  cos-id  be  variublc 
in  type,  shape  and  size.  Rigid  type  and  si?e  declarations  that  would 
normally  aid  a  compiler  were  omitted  from  the  language  as  they  were 
seen  to  burden  the  user;  conversions  and  space  management  were  han¬ 
dled  automatically  by  SYMBOL'S  hardware.  Structures  of  arbitrary 
shape  were  to  be  explicitly  representable  in  the  language.  A  top  down 
design  was  derived  from  the  language  specification  and  the  desire  to 
support  multiple  users  in  an  interactive  environment.  Purl  ol  the 
research  effort  was  to  probe  the  limits  of  hardware;  even  such  tradi¬ 
tional  software  functions  as  the  text  editor  were  pul  in  hardware  The 
system  was  designed  so  that  a  user  could  walk  up  lo  a  cold  computer, 
tarn  it  on,  and  have  all  the  functions  necessary  to  begin  programming 
in  a  high  level  language  using  virtually  no  system  software.  The 
resources  needed  to  design  this  complex  hardware  were  substantial  A 
computer  aided  design  system'’ 7  was  developed  to  check  liming  and 
loading,  to  do  placement  and  wire  routing,  and  to  maintain  a  system 
for  documenting  the  circuitry  of  more  than  20,000  packages. 

At  the  time  that  the  fabrication  of  SYMBOL  was  completed  and 
debugging  began,  the  semiconductor  industry  was  in  a  recession  and  a 
managerial  decision  was  made  not  to  continue  the  project  through  a 
second  design  that  Iowa  Stale  University  waa  to  have  received  for 
evaluation.  Instead  ISU  obtained  the  original  machine  from  Fairchild 
in  1971.  through  a  grant  from  the  National  Science  Foundation,  foi 
Ihe  purpose  of  bringing  the  machine  to  full  operaliiMt  mi  thai  the 
unique  ideas  of  the  architecture  could  he  more  fully  documented  and 
evaluated.  At  ISU  the  machine  was  brought  into  useful  operation  by 
I97.V  Work  on  the  system  software  and  hardware  was  done  by  a 
group  of  about  six  people,  mainly  graduate  students.  Funding  lor  the 
project  terminated  in  1978.  and  shortly  afterwards  hardware  failures 
forced  the  machine  to  be  permanently  decommissioned. 


Espartaac*  wkh  a  Hi^i  Lml  Imtrwctiati  Set 

The  SYMBOL  instruction  set*  v  reflects  the  SYMBOL  Prog  jam 
ming  l-anguage  with  almost  a  one-to-one  correspondence  between 
tokens  in  the  mice  and  the  object  code.  The  hardwired  Translator 
take*  a  source  program  and  generates  an  internal  postfix  representation 
to  be  ease  Mart  by  the  Central  Processor  All  operators  are  generic;  the 
types  of  tyerandt  are  determined  from  the  descriptors  and  type  tags 
associated  with  each  identifier  or  constant  The  instruction  set  is 
aesthetically  appealing  in  its  simplicity,  There  are  approximately  fifty 
itranielhew.  only  six  of  which  require  an  address  field  All  reletcnevs 
lo  identifier!  art  made  with  an  instruction  that  contains  the  address  of 
the  identifier's  descriptor.  Constants  may  appear  in-line  and  are 
always  tagged  The  advantages  of  the  instruction  set  would  appear  to 
be  its  semantic  conciseness  and  uniform  mechanism  for  referencing 
data. 

Code  compaction 

There  arc  several  problems  with  the  high  level  nature  of  the 
instruction  set.  only  a  few  of  which  are  specific  to  SYMBOL  The 
high  level  and  postfix  stuck  orientation  of  the  instruction  set  were 
expected  to  give  good  code  compaction.  Closer  examination  however 
revealed  that  SYMBOL’S  code  was  much  less  compact  for  typical  pro¬ 
grams  than  on  traditional  machines  such  as  the  IBM  Ml  or  PDF- 1 1 
Several  factors  account  for  this  poor  code  density .  A  sjhstanti.il  frac¬ 
tion  of  the  object  code  consisted  of  non  functional  "end  of  statement’ 
operations,  debugging  links  pointing  to  the  source  program  and  No- 
Ops.  Code  density  was  also  lost  due  the  (act  that  opcodes,  which  are  I 
byte  in  length,  could  be  placed  only  in  the  first  or  fifth  bytes  of  the 
eight  byte  word,  thus  waning  three  bytes  for  each  opcode  that  did  not 
require  an  address  field.  The  Translator  contributed  to  the  problem  by 
producing  extremely  poor  code,  at  times  even  rcplicuting  non¬ 
functional  instructions.  The  strict  one-to-one  correspondence  between 
source  and  object  code  resulted  in  the  absence  of  many  instructions 
that  could  have  been  useful  in  optimizing  for  common  special  cases. 

(  samples  n(  such  instructions  would  be  increment,  set  to  zero,  and 
append  a  character.  The  unusual  memory  structure  also  hindered  rule 
compaction  by  prohibiting  any  address  calculations,  thus  precluding 
space  suving  using  relative  addresaing  techniques.  The  lesion  learned 
was  that  code  compaction  does  not  necessarily  result  from  high  level 
instructions,  and  that  factors  of  two  or  three  in  code  density  can  be 
tost  without  careful  integration  of  the  instruction  set.  compiler  technol¬ 
ogy  and  the  memory  structure. 

High  Level  Instructions  and  Interrupt  Handling 

An  unexpected  lemon  was  that  there  arc  limes  when  instructions 
cun  he  ut  too  high  a  level.  Because  of  the  variable  vngth  operands 
und  high  level  operations,  hundreds  or  even  thousand,  of  memory 
references  could  be  required  to  execute  a  single  instruction.  This  had 
rather  severe  consequences  on  interrupt  handling  (page  fault,  disk  ser¬ 
vicing.  user  interrupt,  process  switch,  etc.).  Proper  interrupt  handling 
requires  the  ability  to  mo?  execution,  hur.J'.c  the  interrupt,  and  then 
resume  execution  of  the  original  instruction  at  the  point  of  the  inter¬ 
rupt,  For  efficiency  reasons  it  is  important  to  he  able  to  stop  execution 
of  an  instntetion  (without  completion),  save  all  state  information  active 
in  the  processing  of  the  instruction  nnd  resume  execution  at  or  near 
the  point  uf  interruption  ruthcr  than  to  restart  execution  of  the  instruc¬ 
tion  from  the  beginning.  For  a  high  level  algorithm,  the  state 


information  that  must  be  savtd  can  be  rattier  large.  A  large  fraction 
ol  SYMBOL'S  design  bugs  were  the  remih  of  the-  (allure  to  save  all  the 
necessary  state  information.  Dus  type  of  bug  was  extremely  difficult 
to  track  down,  aa  the  fatal  interruf*  was  often  generated  non- 
deterministically  from  combinations  of  dnk  interrupts,  dock  time-outs 
or  users  pressing  interrupt  buttons.  Another  problem  wm  the  inability 
to  save  ail  the  necessary  information  for  particular  staptt  of  the  algo¬ 
rithm.  These  oversifhu  were  eventually  fixed,  sometimes  at  the 
expense  of  suiting  state  information  at  "convenient  checkpoints". 
Restarting  at  such  checkpoint  repeated  neediest  work  after  tsek  diut- 
downt.  snd  worse,  earned  hundreds  of  time*  more  elate  levee  than 
were  necessary;  this  degraded  system  performance  perhaps  as  much  at 
un . 

Optimization 

Code  optimization  in  SYMBOL  would  be  difficult  to  achieve 
because  ol  the  generalized  nature  of  the  operatkne.  The  addition  of 
lower  level  instructions  could  have  allowed  optimization  of  many  spe¬ 
cial  cases.  For  example,  incrementing  a  variable  on  SYMBOL  could 
lake  over  a  dozen  memory  references  due  to  its  stack  mechanism  and 
indirection  through  descriptors.  The  uniform  referencing  to  data  struc¬ 
tures  meant  that  a  compiler  could  not  optimize  accessing  for  special 
cases,  in  particular  a  tremendous  performance  penalty  was  paid  with 
SYMBOL  because  the  memory  structure  made  it  impossible  to  perform 
traditio'rul  indexing  and  address  calculations.  Even  if  such  indexing 
were  possible,  (here  would  be  an  incompatibility  because  of  the  inabil¬ 
ity  to  do  binary  arithmetic  for  addressing  on  the  decimal  only 
machine. 

Descriptors  und  Tags 

Because  SYMBOL  was  one  ai  the  few  examples  of  a  descriptor 
based  machine  and  a  tagged  architecture,  a  few  comment*  are 
appropriate.  Operand  and  instruction  tagging  wm  useful  in  catching 
occasional  machine  errors  where,  for  a  number  of  reason*,  a  memory 
reference  returned  an  incorrect  value.  There  were  never  any  instances 
where  dam  could  powihly  be  miriaken  for  program  or  vice  versa;  this 
did  in  fact  report  many  machine  error,  that  might  have  gone 
undetected  in  a  traditional  machine.  Tags  were  also  of  great  benefit  in 
debugging  and  in  developing  sophisticated  software  debugging  tods. 

Descriptors  had  an  even  stronger  impact  on  SYMBOL,  both 
positive  and  negative.  Descript  on  were  invaluable  in  efficiently  imple¬ 
menting  the  dynamic  typing  present  in  the  language  and  in  the  benefits 
provided  for  debugging  tods.  On  the  other  hand,  implementing  recur¬ 
sion  in  the  SYMBOL  Programming  Language  was  a  task  left  to  system 
software,  and  turned  out  to  he  extremely  inefficient.  A  ample  tea  of 
Ackcrmann’s  function  would  show  SYMBOL  to  be  at  least  three  ord¬ 
ers  »(  magnitude  slower  than  traditional  machines.  The  main  problem 
was  that  the  descriptors  for  die  entire  procedure  had  to  be  copied  upon 
a  recursive  call  if  the  descriptors  themselves  might  be  modified  in  the 
call  --  a  virtual  certainty  in  SYMBOL. 

Heed  for  a  Systems  Language 

One  of  the  problems  with  the  SYMBOL  language  arid  instruction 
set  was  that  they  were  not  efficient  for  lower  level  tasks  common  to 
systems  programming.  The  support  toots  on  SYMBOL  could  have 
been  more  effectively  supported  though  a  systems  oriented  language 
such  as  Bf'PL."’  BI.ISS,"  or  C.12  While  inefficiencies  in  short  lived 
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user  programs  could  he  tolerated  the  same  can  nig  he  said  for  system 
software  The  SYMBOL  Programming  Language  turned  out  to  he 
inappropriate  for  systems  programming.  It  is  recommended  that  even 
on  computers  that  intend  to  suppr  rt  only  one  user  la  guage.  a  signifi¬ 
cant  effort  should  go  into  supporting  an  underlying  systems  language 
Addition  of  a  few  lower-level  instructions  could  have  made  SYMBOL 
an  effective  multi  language  system. 

System  Software  and  the  Hardwired  Operating  System 

The  functions  of  a  co  iplete  time-shared  operating  system  were 
implemented  directly  in  hardware  hy  the  System  Supervisor.1'  aided  in 
the  Memory  Controller,  Memory  Reclaimer,  Channel  Controller. 
Drum  CoiUollei,  and  Input/Outpul  Processor.  System  software  was 
intended  only  to  handle  certain  exceptional  conditions,  but  in  fact  was 
used  ui  a  much  greater  extent  than  the  designers  originally  foresaw 
Sulistuntial  efforts  of  Ihe  research  team  were  spent  on  developing 
loaders,  text  editors,  improved  diagnostics,  denugging  packages,  library 
routines  and  a  file  system.  This  software  wus  seen  as  essential  to  make 
the  systcnv'uscr  interface  tolerable  System  software  accounted  for 
several  thousand  lines  of  code  by  Ihe  end  of  the  project.  Much  of  the 
success  of  this  software  was  due  to  the  foresight  of  the  designers  in 
providing  "hixiks"  in  the  harowure  for  software  intervention,  allowing 
the  system  to  retain  some  flexibility  despite  its  hardwired  implementa¬ 
tion.14 

Two  important  questions  arc  answered  by  SYMBOL  concerning 
the  Ix'tiefn.s  derived  from  implementing  major  parts  of  an  0pcr.1V;;’. 
.system  in  hardware,  first,  it  would  seem  that  the  overall  desian  costs 
ol  developing  a  hardware  implemented  operating  system  arc  much 
higher  than  an  equivalent  software  implementation,  the  desire  to  lessen 
the  cost  of  developing  an  operating  system  was  not  achieved.  Sufnwrr 
costs  were  reduced,  but  uvtrall  costs  were  not.  Traditional  software 
bug  fixes  were  merely  exchanged  for  a  "Request  fur  Hardware  Modifi¬ 
cation'  sheet,  (tic  hound  RFHMs  were  over  tour  inches  thick  ••  and 
accounted  only  fur  changes  after  the  system  wus  delivered  "debugged' 
to  1SU! 1  The  second  and  more  positive  point  is  that  the  implementa¬ 
tion  of  the  hardwired  operating  system  seems  to  have  been  very  suc¬ 
cessful  from  a  performance  and  programming  standpoint.  Though  the 
inllexibility  of  the  hardware  often  prohibited  changes  towards  more 
"modern"  operating  system  concepts,  the  implementation  was  veiy  sue- 
cessful  in  terms  of  the  original  design  goals.  Using  hardware  for 
heavily  used  functions  such  as  process  scheduling,  virtual  memory 
management,  memory  allocation,  and  scheduling  of  multiple  processors 
seems  lo  have  been  a  wise  tradeoff.  It  was  also  shown  that  complex 
hardware  can  he  successfully  interfaced  to  Ihe  xultwme  part  of  the 
operating  system.  In  terms  of  the  overall  design,  SYMBOL  deserves 
lecognitnm  as  a  successful  Operating  System  Machine  as  much  as  it 
does  foi  Iviug  a  High  Level  Language  Machine. 

A  rule  of  Two  Processc  -s 

While  hardwired  implementation  of  high  level  functions  has  its 
merits,  a  look  at  two  of  SYMBOL'S  processors  might  prove  insightful 
Pvt  haps  the  most  striking  aspect  of  SYMBOL  to  a  user  was  the  amaz¬ 
ing  s|iecd  at  which  programs  were  compiled  (70.IKHI  to  IIKl.lHKi  stale 
nienls  pel  minute).  Ihe  SYMBOL  Translator1'  is  probably  ihe  mils 
example  ol  a  compiler  implemented  entirely  with  random  logic  The 


I  sa  n- la  I  or  is  pcrha|)s  the  most  amazing  of  SY'MBOL  s  processors  mil 
onh  because  ol  its  tremendous  speed  of  compiling  but  also  in  thai  n 
worked  at  all.  One  ol  the  benefits  of  this  tremendous  translation 
speed  was  that  no  ohiect  files  were  saved  This  was  :m  advantage  in 
saving  storage  space  and  m  insuring  that  object  programs  always 
reflected  the  current  source  program. 

We  do  not  wish  to  imply,  however,  that  such  speeds  arc  gen¬ 
erally  obtainable  from  a  hardwired  compiler  and  a  high  level  m-.t  ruc¬ 
tion  set.  The  performance  figures  of  SYMBOL'S  Translator  ure  some¬ 
what  misleading  in  that  the  speed  came  primarily  from  two  other  fac- 
tors.  First,  the  SPL  language4-5  had  a  grammar  designed  to  be  easy  to 
parse.  Nun-optimal  code  was  generated  in  one  pass  with  backpaiching 
and  without  the  need  for  building  compile-time  data  structures  The 
high  translation  speed  could  not  he  expected  in  a  proper  implementa¬ 
tion  of  a  compile!  lor  SPL  or  more  complex  programming  languages 
Second,  the  'translator  did  almost  nothing  more  than  crude  code  gen¬ 
eration  or  assembly  Lrrui  diagnostics  were  next  lo  non-existent, 
though  m  the  ntujonty  of  cases  syntax  errors  m  programs  were 
detected.  Our  experience  suggests  ihat  compilers  should  imly  he  con¬ 
structed  using  a  high  level  programming  language.  Compiler  complex¬ 
ity  can  perhaps  he  attacked  more  successfully  hy  using  modem  com¬ 
piler  writing  lixils11'  1  than  by  developing  high  level  instruction  sets 
Hie  poor  design  ol  the  T'ranslatot  was  undoubtedly  due  in  large  part  to 
the  low-level  implementation  the  designer  was  forced  lo  work  with  and 
the  infantile  stale  ol  compiler  technology  in  ihe  early  1%0's. 

Debugging  the  Translator  hardware  was  extremely  difficult,  js 
register  level  flow  charts  and  wire  lists  proved  to  he  a  totally  inade¬ 
quate  form  of  documenting  the  conceptual  process  of  t.-nslation.  In 
no  way  could  the  design,  implementation  and  debugging  ol  the 
SYMBOL  s  Translator  have  been  cost  effective  compared  to  a  compiler 
programmed  in  a  high  levc1  language.  The  hurdwaie  dedicated  to  the 
I'rar.slutor  was  not  cost  ellcclive.  as  the  logic  was  rarely  in  use  and  a 
similar  function  could  have  been  performed  hy  the  Central  Pnxessoi 
Perhaps  a  more  reasonable  tradeoff  would  have  been  lo  provide  the 
Central  Processor  with  special  purpose  hardware  to  aid  with  the  vari¬ 
ous  translation  functions.  This  would  have  had  Ihe  added  benefit  ot 
allowing  special  purpose  hardware  to  lx-  used  for  other  functions  in 
addition  to  translation 

Even  more  than  the  Translator,  the  I/O  Processor  suffered  lion- 
the  rigidity  ol  a  hardwired  implementation.  To  offload  the  Central 
I’rixessor,  the  I/O  I’rixessor  contained  a  hardwired  text  editor  that  ran 
extremely  qunkly  I  nlortunaiely  the  pushbutton  operated  eilitoi  was 
so  ditticull  to  use  and  so  primitive  Ihat  all  on-line  editing  was  done  in 
the  Central  Processor  with  software  text  editors.  T  he  strict  separation 
ol  the  I  O  Pnxesvir  anil  Ihe  Central  Pnxessor  did  not  allow  the  primi¬ 
tives  in  the  hardwired  'ext  editor  to  be  shared  by  Ihe  software  text  edi¬ 
tors. 

Two  lessons  are  evident.  First,  essential  utilities  of  a  system  such 
as  a  text  editor  and  compiler  need  the  ability  to  change  ami  grow .  both 
to  correct  hugs  and  to  add  new  features.  The  hardwired  approach  did 
not  allow  the  possibility  lor  this  growth  The  functional  division  was 
al  ti«)  gross  a  level,  eg.  the  specialized  hardware  in  the  Translator 
provided  an  all  or  none  service.  Second,  special  |itirp  ise  hardware  - 
made  llcvihle  In  imxlulnriz.mg  primitive  operations  so  ihcy  can  lx  eon 
nulled  hy  the  soltware  II  the  sequencing  ot  the  primitives  in  the  1 1) 
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faucem*  had  teen  contnriiabfc  by  tetvove  accessible  by  the  Central  dramttically  The  greatest  benefit*  were  realized  when  one  rixth  to 

hoc***.  psrtonunoc  of  the  software  editor*  might  have  been  much  one  fourth  of  each  page  was  reaerved  for  the  above- mermooed  exptn- 

cVwrr  to  that  of  the  hardwired  text  editor.  Much  of  the  problem  of  uort 

SVMBOt  waa  that  the  iteaignm  thought  the*  know  how  users  would  Experiment*  were  performed  reducing  SYMBOL'S  page  ue 

want  so  tax;  the  Machine.  When  thin  view  was  changed  even  slightly ,  from  the  built  in  2K-byte*  per  page  as  low  aa  2)6  byte*  per  pqpe.  The 

the  hardwired  nature  of  the  Trends! or.  Editor  and  operating  system  use  of  smaller  pages  usually  reduced  the  paging  activity  far  a  fixed 

locked  the  war  into  a  mold  he  did  nee  want  to  be  in.  main  memory  size.  This  technique  worked  whenever  aevare  Kettering 

was  encountered,  regardless  of  its  origin.  Unfortunately,  the  we  of 
Mawary  Mamagw— It  A  Caaa  af  «ten— r  BadfaMaws  small  pages  could  hurt  where  sequential  access  to  a  large  body  of  code 

One  of  SYMBOL'S  unique  feature*  was  its  complex  memory  >*  J“>*  Wc»l  Furthermore,  the  cod  of  the  overhead  awodated 
orpuuzuiion.  SYMBOL  provided  direct  hardware  support  both  for  a  “"h  a  large  numbet  of  pages  could  become  ugnificam  Although  it 

paged  virtuel  memory  and  far  dynamic  data  structures  The  SYMBOL  *•»*“  h»ve  contradicted  the  deciaratkxvfrte  character  of  SPL.  one 

Hardware  MlffortesJ  the  ahocutk*.  deletion  and  manipulation  ol  cannot  help  but  speculate  that  the  ability  to  requeM  contiguous  alloca- 

xtotage  strings.  Them  storage  swings  were  constructed  by  Unking  «*  structures  have  reduced  paging  cowideoMy. 

lofltcther  eight-word  groups.  Linked  leas  of  such  storage  brings  were 


wed  tat  rvprcmat  tree  structures  which  were  ucccssed  in  SP1.  as  hctcro- 
gencuus  anrays.  The  uses  and  dupe*  of  these  structures  were  dynami¬ 
cally  variable. 

The  daaigaen  of  SYMBOL  foresaw  and  attempted  to  mitigatc 
the  advent  interaction  of  SYMBOL'S  unique  combination  of  memory 
management  and  virtual  memory.  They  realized  that  particular 
machiua  hsnctionl  had  characteristic  memory  access  patterns.  For 
example,  the  source  code  eras  used  in  prugram  editing  but  me  at  all 
during  execution.  In  program  compilation,  source  code  and  ohgrci 
cods-  were  scanned  only  once,  whereas  tire  name  tables  were  scanned 
rcp'.atcdh  Hence,  the  designers  decided  that  each  page  shoukl  he 
uv:J  for  a  single  purpose  and  that  page  lists  would  he  maintained  to 
segn.-gatc  the  pages  according  to  their  use.  When  memory  was  allo- 
calcl.  ihc  crude  usage  class  lit  (he  ncedctl  space  was  specified  by  the 
hardware.  This  usage  class  determined  which  page  list  the  system 
would  consult  to  find  the  needed  space  SYMBOL  maintained  three 
separate  page  lists:  one  fat  source  code,  another  for  object  code,  and 
the  third  far  ah  other  need*.  Once  any  space  on  a  page  was  allocated, 
the  paw  was  inserted  on  the  appropriate  page  list.  Henceforth,  that 
page  would  only  be  used  for  further  allocations  of  space  of  the  sumc 
usage  dam.  This  scheme  worked  well  fur  prugram  editing  and  (or  con¬ 
structing  name  labitt  and  object  code  at  compile  time.  However,  at 
execution  time,  ah  data  accessing  involved  one  pugc  list,  so  there  was 
no  advantage  to  this  scheme  at  that  time. 

h  would  have  been  worth  while  to  experiment  with  adding  more 
page  lists  to  SYMBOL  Ksts  of  pages  used  solely  for  the  stKk.  lor 
temporaries,  or  for  large  structures.  This  likely  would  have  limited  the 
scattering  of  these  Objects  by  restricting  them  to  a  segregated  set  ot 
pages  Unfortunately,  implementation  of  additional  page  lists  would 
have  required  extensive  modifications  throughout  SYMBOL  'S  Central 
Pnccsaor.  and  hence  was  never  actually  tried. 

|n  SYMBOL,  a  single  Urge  structure  could  come  to  occupy  small 
portions  of  a  Urge  number  of  pages.  There  was  no  mechanism  for 
compacting  them  structures.  Modifications  to  the  memory  allocation 
strategy  attacked  the  problem  by  preventing  vane  ot  any  reclaimed 
space  on  each  page  from  being  found,  except  for  expansion  of  struc¬ 
tures  which  already  occupied  a  portion  of  that  page.  This  was  known 
as  the  Space  Available  List(SAL)  Threshold  technique.  ’*  Measure¬ 
ments  taken  on  SYMBOL  programs  which  had  had  significant  paging 
activity  indicated  that  this  approach  reduced  the  number  of  page  faults 
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Debuggfag  Software  au  SYMBOL 

An  outstanding  benefit  from  the  high  level  nature  at  the  SYM¬ 
BOL  computer  was  shown  in  the  efficacy  at  the  dr  hugging  toots19-® 
produced  for  the  system.  Program*  were  developed  to  alow  the  uacr 
to  examine  the  state  of  his  program  in  detail  at  the  aoutee  program 
level.  For  example,  at  a  user-generated  interrupt  the  pngnttwmr 
could  ask  the  inquire  subsystem  where  the  program  waa  executing  and 
have  the  statement  in  execution  decompiled  for  diepiay.  The  decompi- 
latinn  process  was  remarkably  effective,  and  generally  differed  front 
(he  original  vxirce  program  only  with  respect  to  spaces  and  redundant 
parentheses  Since  SYMBOL  was  a  descriptor  based  and  tagged  archi¬ 
tecture.  the  current  types  and  values  of  all  identifiers  in  the  uaer't  pro¬ 
gram  were  known. 

There  was  never  any  need  for  a  programmer  to  realize  that  hit 
program  was  being  translated  into  an  intermediate  form  for  execution. 
This  is  one  ol  the  strongest  points  for  the  daim  that  SYMBOL  waa  a 
High  Level  Language  Computer  System.21  In  addition  to  the  benefits 
that  the  machine  offered  for  debugging,  the  dynamic  type  checking 
mechanisms  in  the  hardware  proved  very  valuable  for  detecting  occa¬ 
sional  machine  errors  such  at  trying  to  use  instructions  at  data  or  vice 
versa. 

Debugging  Hardware  aa  SYMBOL 

One  of  the  questions  the  implementation  of  SYMBOL  wat  sup¬ 
posed  to  answer  waa  whether  or  not  extremely  complex  hardware  could 
he  designed  and  debugged.  The  answer  it  that  complex  hardware  can 
be  designed  and  debugged  but  only  through  the  investment  of  tremen¬ 
dous  effort  and  time.  In  1971  SYMBOL  waa  debugged  to  the  point 
where  it  could  run  simple  programs,  yet  in  197*  bugs  were  still  being 
found  in  various  processors  The  atuation  appeals  to  be  no  different 
from  hugs  that  plague  software  years  after  a  program  it  developed, 
even  if  it  is  continuously  having  bugs  removed.  The  authors'  experi¬ 
ence  with  debugging  the  SYMBOL  system  tnd  men  conventional 
software  projecti  would  suggest  that  bugs  in  hsrdwan  occur  in  much 
the  same  way  that  they  do  in  software.  However,  the  probfanu  mnd- 
ated  with  finding  and  curing  hardware  bugs  are  far  more  severe. 

Changes  to  hardware  are  more  time  consuming  than  changes  to 
software.  Modifications  to  SYMBOL  had  to  be  done  with  extreme 
care,  changes  often  hid  unexpected  side  effects  because  the  conceptual 
details  of  an  algorithm  were  not  documented  as  they  might  have  been 


with  well  commented  software.  It  was  not  uncommon  to  cure  the 
symptom  t other  than  cure  the  problem  because  of  this  lack  of  concep¬ 
tual  documentation.  Unlike  software,  certain  changes  could  not  be 
made  because  of  physical  limitations  such  as  the  number  of  bus  pins  or 
the  number  of  1C  packages  that  would  fit  on  a  hourd.  Hardware 
errors  ami  bugs  were  not  always  deterministic  Because  of  this  non- 
determinism  it  was  first  necessary  to  ascertain  whether  a  hug  was  due 
to  an  incorrect  algomhm  or  if  a  circuit  was  failing  because  of  a  had 
component . 

Any  similar  scale  hardware  project  must  make  special  efforts  to 
provide  the  maximum  possible  effort  for  developing  design  and  debug¬ 
ging  tools.  The  state  of  the  an  in  constructing  and  debugging  digital 
syslcnts  is  Itu  behind  the  same  technology  of  software  systems.  This  is 
probably  connected  with  the  limited  use  of  high  level  engineering  sys- 
tems  such  us  SCALD22  or  DRAW.23  Computer  aided  debugging  is  a 
necessity.  SYMBOL  needed  the  ability  to  trace  and  store  the  last 
several  thousand  operations  in  real  time  and  have  the  (race  informa¬ 
tion  unulyzud  automatically.  The  limited  ttace  facility  on  SYMBOL 
pcrlurtied  the  system  sufficiently  that  some  errors  would  go  away  when 
traced,  and  when  u  problem  could  be  traced  reliably  il  was  often 
Ix-yorid  the  ability  of  a  human  to  read  through  hundreds  ol  lines  of  hex 
I'll  patterns  to  tind  the  offending  error. 

Von  Neumann  Realities 

SYMBOL  is  a  classic  example  of  a  distinctly  non-von  Neumann 
architecture.  Features  that  take  it  out  of  the  von  Neumann  class  arc 
the  non-contiguous  memory  structure,  automatic  memory  manage¬ 
ment,  distinguishnbility  of  instructions  from  data,  the  self-describing 
nature  of  structures,  and  the  high  levrl  instruction  set.  An  early  paper 
made  the  comment  that 

as  implemented  in  the  SYMBOL  hardware,  however,  any  tusk 
requiring  the  variable  field  length  processing  and  storage  or  the 
dynamic  structure  (catutcs  of  the  language  should  show  a  consid¬ 
erable  |wrformance  gain  over  conventional  softwure/hardwarc 
systems.  3 

Bxperieme  with  SYMBOL  suggests  that  this  is  probably  true,  but 
unfortunately  there  were  not  enough  tasks  of  this  type. 

The  reality  was  that  programs  on  SYMBOL,  as  on  most  comput¬ 
ers,  tended  to  do  relatively  simple  operations.  Arithmetic  operations 
were  mainly  adding  or  subtracting  very  small  integers;  little  use  was 
made  of  the  W  digit  precision  controlled  arithmetic.  Character  strings 
were  most  frequently  only  a  single  character,  and  rarely  exceeded  a 
dozen  characters  in  length.  While  some  use  was  made  of  dynamically 
variable  arrays,  arrays  were  almoat  always  homogeneous  and  remained 
static  once  grown.  At  the  machine  level,  it  hurt  a  great  deal  that  the 
memory  structure  nnd  decimal  arithmetic  processor  precluded  indexing 
wfh  address  arithmetic.  Object  oode,  name  tables,  and  source  files 
were  always  static  objects  after  their  czeadon;  a  better  storage  organi¬ 
zation  for  these  would  perhaps  have  teen  a  traditional  contiguous 
linear  store.  The  mcral  of  this  story  is  that  the  traditional  von  Neu¬ 
mann  computer  is  perhaps  not  so  ill-tui'cd  lo  the  operation*  actually 
performed  by  typical  programs.  The  v?L  language  and  SYMBOL 
hardware  were  more  powerful  than  the  average  user  required.  Some 
of  SYMBOL  *  more  advanced  features  could  have  been  implemented 
by  software  on  a  traditional  machine  to  achieve  a  more  cost  effective 


solution  to  the  same  problems  Perhaps  the  conclusions  would  have 
been  different  in  another  environment,  but  SYMBOL  was  nol  as  much 
an  advantage  over  (he  son  Neumann  machine  as  had  been  hoped  ear¬ 
lier 

Microcode 

Ihe  hardwired  nature  of  rhe  SYMBOL  machine  is  ollcn  criti¬ 
cized  tor  its  inflexibility  Microcoding  has  been  suggested  as  an  imple¬ 
mentation  solution  that  is  flexible  and  still  efficient.  The  understand¬ 
ing  of  the  authors  is  that  during  the  60's  when  technology  decisions 
were  being  made.  ROM's  suitable  for  microcode  lacked  speed,  lacked 
density,  and  were  prohibitively  expensive  for  the  quantities  required 
for  SYMBOL.  If  one  were  to  design  the  same  processors  today, 
microcoding  is  obviously  superior  to  a  random  logic  implementation. 
Part  of  the  SYMBOL  experiment,  however,  was  to  push  the  limits  o(  a 
completely  hardwired  implementation,  microcode  would  not  have 
accomplished  this.  The  significant  lessons  to  be  learned  from  SYM¬ 
BOL  are  not  whether  it  should  have  been  microcoded  or  not.  hut 
ruthet  in  the  lessons  teamed  about  system  complexity,  refinement  oi 
complex  systems,  debugging  of  complex  systems,  functional  division, 
and  instruction  set  design,  in  many  instances  system  software  needs  to 
he  installation  modifiable,  a  microcode  implementation  would  generally 
not  (all  into  this  category 

Was  SYMBOL  Really  a  HLLCS't 

It  is  crucial  to  note  why  we  consider  SYMBOL  to  lie  one  ol  the 
tew  real  High  Level  language  Computer  Systems.  The  SYMBOL 
machine,  with  and  only  with  the  software  developed  for  it,  meets  the 
Hl.LC'S  definition21  hecause  it: 

( 1 )  Uses  a  high  level  language  tot  all  programming,  debugging  and 

other  user/system  interactions. 

(2)  Discovers  and  reports  syntax  and  execution  errors  in  'erme  of  rhe 

high  level  language  source  program.20 

13}  Does  not  have  any  outward  appearance  of  transformations  from 

the  user  programming  language  to  any  internal  languages. 

Perhaps  the  mosi  crucial  part  of  meeting  this  definition  in  any  system 
is  being  able  to  debug  a  program  at  the  source  language  level.  The 
SYMBOL  architecture  facilitated  this  with  high  level  instructions  lhat 
ullowcd  object  code  lo  be  easily  de-compiied  back  into  source,  and  in 
the  sett-describing  nature  of  all  data  objects  that  allowed  the  unambi¬ 
guous  interpretation  of  any  data  storage.  A  High  l-evcl  Language 
Computer  System  is  different  from  and  more  important  than  just  a 
machine  wilh  a  high  level  instruction  set. 

Conclusloa 

The  existence  of  the  working  SYMBOL  computer  system  dearly 
demonstrates  lhat  a  high  'evel  instruction  set,  a  compiler,  automatic 
memory  management  and  a  major  portion  of  a  time  Shared  operating 
system  can  he  implemented  successfully  in  hardware.  Use  of  the 
SYMBOL  system  showed  to  a  lesser  degree  that  the  costs  of  Imilding 
such  ti  system  arc  not  less  than  building  an  equivalent  system  in 
software,  lluil  the  alii  lily  lo  evolve  a  system  is  perhaps  more  important 
than  having  a  very  fast  functional  unit  that  is  never  used:  shut  perfor¬ 
mance  gains  from  hardwired  implementation  are  easily  last. 


KYMBOI.  taught  us  a  grcal  deal  about  building  complex  systems 
The  tup  down  design  approach  made  it  iveccxniry  ft*  the  the  entire  .vs- 
tem  lo  he  conceived  before  any  of  it  wu  implemented;  the  results 
allow  that  flu*  it  dangerous.  Building  complex  hardware  it  prone  hi 
the  tame  bugs  and  fyndnmenul  detigr  errort  that  piapae  complex 
toftware  iytnn.  SYMBOL  contained  many  excellent  and  unique 
solution*  in  individual  problems  hut  the  complex  interactions  of  all  of 
t hoc  Mduduna  combined  lu  make  the  entire  system  cumbersome  and 
ttkiw.  riftnnnrm  and  iterative  improvement  ate  steps  that  moat 
si t ware  nyxtems  mum  pi  through  hcforc  reuchinii  acceptable  levels  of 
performanee  and  utilily;  this  xtep  was  desperately  needed  with  SYM¬ 
BOL.  Performance  could  have  been  improved  perhaps  more  than  an 
cedar  of  mapitirfr  if  many  of  the  known  inefficiencies  could  have 
been  tuned  or  removed.  Dmpite  levtral  negative  comments  in  this 
pqptr.  the  SYMBOL  experience  waa  a  very  positive  tint  step  in  the 
deagn  of  High  Level  Language  Computer  Systems 
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ABSTRACT 

Tills  paper  considers  the  principle  motivations 
for  a  high-level  language  architecture,  Program¬ 
mer  Productivity,  Compiler  Simplification,  and 
Run-Time  Efficiency.  Individually  /nd  collec¬ 
tively,  these  motivations  do  not  represent  com¬ 
pelling  Justification  for  a  departure  from 
conventional  architectures.  It  Is  suggested 
that  a  more  beneficial  architectural  departure 
Is  to  be  found  In  a  lower-level  micro  architec¬ 
ture  Instead  of  a  higher-level  architecture. 


INTRODUCTION 

The  question  of  the  dev  I  rah  1 1 1  ty  ol  .1  high- 
level  language  architecture  was  asked  .11  Ihe 
birth  of  the  stored  program  digital  computer 
by  Burks,  Goldstlne,  and  von  Neumann ( 1 ) , 

"In  general,  the  Inner  economy  of  the 
arithmetic  unit  Is  determined  by  a  com¬ 
promise  between  the  desire  for  speed  of 
operation  --  a  non-alementary  operation 
will  generally  take  a  long  time  to  per¬ 
form  since  It  Is  constituted  of  a  series 
of  orders  given  by  the  Control  --  and 
the  desire  for  simplicity  or  cheapness 
of  the  machine," 

Over  the  years,  architectural  trade-offs  have 
been  made  In  favor  of  selective  Incorporation 
of  complex  functions  In  those  architectures 
where  performance  was  a  dominant  consideration, 
floating  point  as  an  elementary  operation  was 
provided  as  a  hardware  operation  In  the  uvld- 
1950s ”).  a  variation  of  the  FORTRAN  00  loop 
"was  Included  In  the  COC  STAR  and  Tl  ASC  archi¬ 
tectures  In  the  1970s”).  With  vector  Instruc¬ 
tions  Included  as  elementary  operations,  the 
generation  of  addresses  Is  overlapped  with  the 
operation  Itself  yielding  Improved  performance 
and  a  reduction  In  required  memory  bandwidth 
Is  achieved  by  the  reduction  In  the  number  of 
Instruction  fetches. 

A  view  has  been  introduced  Into  the  discussion 
of  elementary  operation  selection.  This  view 
Is  an  observation  that  a  "semantic  gap"'**) 
exists  between  the  programming  language  and 
the  language  which  the  computer  actually  exe¬ 
cutes.  The  existence  of  a  gap  Is  an  invitation 
to  close  the  gap. 


A  recurring  Idea  is  the  high-level  language  archi¬ 
tecture  which  directly  executes  a  selected  lan¬ 
guage.  SYHBOL'5,6)  j&'  t(,ls  type  of  architecture 
as  Is  the  recently  discussed  Ada  processor  by 
Intel”).  For  many  reasons,  these  architectures, 
labeled  "Type  C"  by  Myers'®),  are  deemed  Ineffi¬ 
cient.  Most  proposals  today  for  a  hlgh-laval 
architecture  embrace  some  Intermediate  language'?) 
as  the  language  to  be  accepted  by  the  computer. 

Proposals  for  high-level  language  architecture 
are  based  on  achieving  three  Improvements: 

I.  Programmer  Productivity 
?.  Compiler  Simp  1 1 f I cat ion 
1.  Run-Time  Ifliciemy 

CROfiKAMMLR  PRODUCT  I V I  IY 

Unfortunately,  the  observation  has  been  made  that 
closing  the  gap  will  have  a  significant  positive 
impact  on  programming  cost.  This  has  had  tha 
result  of  drawing  attention  away  from  the  real 
problem  of  selecting  elementary  operation*.  I 
believe  that  this  argument  proceeds  as  follows: 

1.  The  best  performance  and  the  minimum 
cude  space  results  when  a  problem  Is 
programmed  in  assembly  language, 

2.  Poor  performance  and  code  space  result 
if  a  high-level  language  Is  used. 

3.  Programmer  efficiency  Is  Improved  If  a 
high-level  language  Is  used. 

Thus,  a  carefully  selected  intermediate 
execution  language,  which  cen  be  compiler 
generated,  will  give  good  performance, 
reduced  code  space,  and  Increase  programmer 
productivity. 

Programming  costs  are  a  function  of  the  language 
and  the  quality  of  the  support  functions  provided. 
It  should  make  no  difference  In  programmer  pro¬ 
ductivity  whether  the  support  functions  are  pro¬ 
vided  In  hardware  or  software. 

Assistance  In  program  debug  Is  a  benefit  cited 
for  a  high-level  language  erehl tecture' I®)  which 
should  reduce  programming  cost.  I  believe  thet 
there  is  e  lesson  to  be  learned  today  from  the 
support  systems  provided  for  microprocessors. 
Program  development  is  moving  Into  a  cross  sup¬ 
port  mode.  More  and  more  programs  are  developed 
on  a  host  which  is  not  the  computer  on  which  the 
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program  will  execute^').  One  reason  for  this 
is  that  powerful  debug  tools  can  be  provided  in 
the  development  software.  Only  a  very  small 
subset  of  these  tools  could  be  provided  in  the 
hardware  of  a  high-level  language  architecture! 
software  support  would  still  be  needed.  Relat¬ 
ing  execution  errors  during  development  to  the 
source  program  Is  enhanced  more  with  software 
tools  than  with  a  meager  set  of  hardware  capa¬ 
bilities. 

COMPILE*  SIMPLIFICATION 

A  benefit  frequently  advanced  for  a  high-level 
architecture  Is  that  a  well-selected  set  of 
intermediate  level  language  significantly 
reduces  the  complexity  of  the  compiler.  This 
Is  hard  to  understand.  It  can  be  argued  that 
these  compound  elementary  operations  of  the 
Intermediate  language  can  be  defined  as  macro 
subroutines  which  the  compiler  can  easily  pro¬ 
duce.  These  macros  can  then  be  Interpreted  by 
the  machine.  Again,  this  becomes  a  question  of 
cost  end  performance.  This  "soft"  Intermediate 
level  language  architecture  yields  all  of  the 
desirable  compiler  characteristics  as  does  a 
"hard"  architecture.  The  Burroughs  B 1 700 (12) 

Is  an  Illustration  of  this  point.  Cohen  and 
Francls(l3l  describe  another  system  which 
executes  on  conventional  microprocessors. 

I  will  not  argue  that  the  specification  and 
use  of  an  Intermediate  level  language  is  not 
beneficial  for  compiler  creation.  I  do  argue 
that  this  language,  In  total,  should  not  be 
Implemented  In  hardware.  For  those  cases 
where  an  Intermediate  language  seems  beneficial 
to  the  compilation  process,  Interpretation  of 
this  language  Is  completely  feasible,  although 
slow  In  execution.  The  benefits  of  reduced 
code  space,  Including  the  Interpreter,  gen¬ 
erally  are  realized. 

RUM-TIHE  EFFICIENCY 

I  perceive  that  the  semantic  gap  has  become 
highly  visible  because  of  two  factors.  First, 
the  non-computat tonal  overhead  of  structured 
programming  Is  Increasing  the  run  time  of  our 
programs,  and  second,  the  execution  of  operating 
system  functions  Is  also  consuming  a  highly 
visible  amount  of  CPU  time.  In  both  of  these 
cases,  the  root  problem  stems  from  the  lack 
of  a  few  elementary  operations  selected  to 
support  these  functions,  not  a  closing  of  a 
semantic  eap. 

Myers(,<>)  provides  an  Interesting  comparison 
of  the  concepts  of  PL/I  and  the  support  pro¬ 
vided  by  the  S)tO,  I  believe  that  In  every 
case  cited  by  Myers,  the  Issue  resolved  itself 
Into  the  need  for  the  costlier  to  generate  a 
body  of  code  which  Implements  the  PL/I  concept. 
This  Is  en  Issue  of  elementary  operation  selec¬ 
tion  and  the  cost  perforsmnce  of  the  computer. 

The  cost  performance  of  a  computer  having  more 
complex  elementary  operations  Is  of  real  concern. 


Let  me  examine  the  reduction  In  memory  bandwidth 
resulting  from  the  inclusion  of  vector  Instruc¬ 
tions.  Myers''5)  describes  the  case  of  two  100 
by  100  element  fixed  binary  arrays  which  are  to 
be  added  together.  A  programmed  loop  would 
require  60, 004  memory  references  for  Instructions 
and  30,003  for  data,  a  total  of  70,007.  A  single 
vector  instruction  would  require  only  30,001 
(30,000  for  operands  and  one  Instruction).  An 
alternative  to  this  Is  found  In  computers  such 
as  the  CDC  7600.  which  has  a  program  buffer  cache. 
This  architecture  requires  only  eight  references 
to  main  memory  for  the  Instructions  and  30,000 
references  for  the  data.  Vector  Instructions  ere 
not  needed  to  reduce  memory  bandwidth  If  Instruc¬ 
tion  buffering  and  high  execution  rate  Is  pro¬ 
vided  for  the  elementary  operations. 

The  use  of  compound  elementary  operations  can 
reduce  the  storage  requirements  for  Instructions 
due  to  the  Instructions’  higher  Information  con¬ 
tent.  In  Myer’s  example,  the  number  of  Instruc¬ 
tion  bytes  Is  reduced  from  276  to  13.  This  Is  an 
Impressive  reduction!  However,  If  the  program 
represents  20k  of  the  total  memory  requirement, 
for  example,  the  compound  elementary  operations 
can  yield,  at  best,  e  20k  reduction  In  required 
total  memory  space.  This  small  memory  savings 
may  not  be  worth  the  Increased  cost  of  the  CPU. 

Compound  elementary  operations  to  enhance  run¬ 
time  cost  effectiveness  are  provided  at  a  cost 
In  hardware,  logic,  and  microcode.  The  Justifi¬ 
cation  of  this  cost  depends  upon  the  number  of 
times  the  function  Is  executed  In  a  program; 
frequent  use  Justifies,  occasional  use  does  not. 
Figure  1  Illustrates  this  point.  The  higher  the 
cost  of  providing  a  hardware  macro,  the  larger 
the  use  factor  must  be  to  achieve  a  breakeven 
cost . 


Figure  I 


Computer  architects  can  quickly  select  most  of 
the  elementary  operations  of  their  design.  The 
inclusion  of  more  complex  or  compound  elementary 
operations  requires  knowledge  of  the  Intended 
use  of  the  computer.  Care  must  be  exercised 
that  static  and  dynamic  statistics  collected  on 
programs  run  on  a  unique  computer  reflect  the 
true  nature  of  the  problem  and  exclude  the 
characteristics  of  the  computer1  (It).  For  exam¬ 
ple,  code  used  for  run-floe  chocks  wll F  not  lib 
Identified  with  the  higher  purpose  of  the  code. 
Nevertheless,  choices  are  made  and  computers  are 
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designed  and  built,  vrfilch  are  Improvements  over 
prio.‘  designs. 

For  a  computer  which  must  be  multilingual,  that 
is,  can  be  programmed  in  aiany  languages,  great 
care  must  be  exercised  In  the  selection  of 
compound  elementary  operations  which  will  be 
useful  for  all  the  languages.  The  result  of 
Implementing  the  Inteneedlate  language  It. 
hardware  cen  be  a  loss  of  generality.  An 
Intermediate  language  for  COBOL  Is  not  likely 
to  be  the  same  language  for  FORTRAN  or  PASCAL. 
And  what  does  one  do  when  Ada  bacoems  popular? 
Will  the  Intermediate  language  support  the 
new  programming  language  efficiently? 

Figure  2  Illustrates  the  problem  which  Is 
.  .created  as  the  language  Implemented  by  the 
'hardware  approaches  the  programming  language, 
'closing  the  semantic  gap.  In  a  conventional 
processor,  the  high-level  language  Is  com¬ 
plied  Into  machine  language  which  Is  Inter¬ 
preted  by  the  hardware.  As  the  machine 
..  language  approaches  the  programming  HLL, 

..the  machine  languages  will  diverge  end 
become  two  or  more  different  machine  lan¬ 
guages  If  the  semantic  gaps  are  completely 
ifclosed. 
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A  writable  control  store  with  program  accass  to 
sequences  of  microcode  Is  one  technique.  This 
will,  In  effect,  provide  for  the  Interpretation 
of  the  compound  elementary  operations  by  micro¬ 
code.  Substantial  Improvement  In  program  execu¬ 
tion  time  can  result('8, 19,20) ,  The  compiler 
should  be  able  to  make  a  selection  of  those  com¬ 
pound  elementary  operations  which  are  Interpreted 
by  the  machine's  elementary  operations  end  those 
which  are  to  be  interpreted  by  microcode.  Pro¬ 
duction  runs  of  a  program  can  further  adapt  the 
mix  to  achieve  the  fastest  execution  rate. 

A  second  technique,  and  ona  which  Is  attractive 
for  Implementation  In  VLSI,  Is  the  use  of  com¬ 
pound  function  attached  processor* (21 ) ,  A  float¬ 
ing  point  chip  and  an  FFT  buttarfly  chip  which 
can  be  attached  to  a  microprocessor  ere  examples. 
A  Decimal  String  Chip  would  be  useful  for  e 
microprocessor  executing  e  heavy  COBOL  load. 

1  will  concede  that  there  may  be  a  place  In 
computer  architectures  for  the  inclusion  of 
hardware  employed  to  Improve  the  reliability  of 
software  In  execution.  The  run-time  environment 
creates  problems  which  cannot  be  anticipated  by 
the  compiler  or  require  high  checking  overhead. 
This  Issue  should  be  addressed  as  a  stand-alone 
Issue  end  should  not  be  combined  with  the  Issue 
of  a  high-level  language  architecture. 

The  ultimate  architecture  approach  was  suggested, 
1  believe,  by  McKeaman  In  1967(22), 

"The  obvious  attack  for  programmers  and 
hardware  people  together  Is  to  devise 
language  that  reflects  what  we  want  to  do 
and  how  wa  do  It  (for  Instanca,  In  parallel) 
and  machine  structures  effective  In  handling 
that  language.  Let  us  call  this  method 
'language  directed  computer  design.'" 


K«v>|purapu  end  Oregon  ^  are  conducting  e 
search  for  common  elements  end  their  fre¬ 
quency  of  use  In  FORTRAN,  COBOL,  end  PASCAL 
to  see  If  there  era  e  few  compound  operations 
which  will  benefit  all  thraa  languages.  I 
believe  that  there  Is  a  good  chance  that  a 
small  numbar  will  be  found  that,  If  imple¬ 
mented  In  hardware,  will  substantially 
Improve  a  computer's  code  space  and  execu¬ 
tion  time.  Success  In  finding  a  few  Is  not 
a  mandate  to  Implement  everything  In  an 
Intermediate  language. 

A  high-level  or  Intarmsdtate  language  Imple¬ 
mented  In  hardware  I*  too  reatrlctlve  and 
costly.  Howavar,  selective  Imp I  amen tat  I  on 
of  a  small  tat  of  compound  elementary  opera¬ 
tions  can  substantially  Improve  the  perfor¬ 
mance  of  a  computer.  The  question  feeing 
computer  architects  today  It  not  high-level 
language  architectures,  but  archltactures 
which  permit  the  Inclusion  of  selected 
compound  elementary  operations  which 
match  the  use  environment  at  any  given 
time. 


In  the  future,  the  language  referred  to  by 
MeKaeman  must  mean  nonprocedural  programming 
techniques (23, 2a).  The  machine  structures  will 
be  microprogrammed  In  nature.  The  architecture 
will  be  capable  of  either  Interpreting  e  "soft" 
Intermediate  language  or  executing  a  complied 
microprogram.  With  mamory  becoming  tha  least 
costly  component,  compiled  microcode  will  become 
more  and  more  cost  effective.  If  a  lower  per¬ 
formance  Is  satisfactory,  than  tha  Interpreted 
soft  Intermediate  language  can  reduce  mamory 
cost.  I  believe  that  there  Is  no  "ideal  DEL," 
there  may  be  a  DEL  for  every  nonprocedural 
language  and  this  DEL  can  be  Interpreted  on  a 
soft  architecture  If  memory  cost  is  to  be  mini¬ 
mized. 

CONCLUSIONS 

A  case  has  not  bean  made  for  tha  creation  of 
new  architectures  which  Implement  high-level 
or  Inteneedlate  level  languages.  All  of  the 
benefits  can  be  achieved  without  tha  loss  of 
generality  by  selective  Implementation  of  some 
compound  elementary  operations  In  callable  mlcro- 
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cod*  or  attached  processor*.  The  ultimate  archi¬ 
tecture  wilt  be  a  lower-level  one,  not,  as  many 
advocate,  a  higher-level  on*. 
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ABSTRACT 

In  thle  paper ,  tha  daalgn  goal*  of 
direct  execution  databaaa  computers  are 
stated .  Ualng  an-  axle  ting  databaaa  mansgs- 
raent  aoftware  ay a tea,  tha  paper  attempts 
to  show  tha  replacaaent  of  the  aoftwara 
system  with  a  hardware  databaaa  computer 
may  not  obtain  uniform  performance  gains 
and  storage  savings.  This  discovery  aiy 
render  tha  original  design  goals  overly 
ambitious. 

On  tha  other  hand,  tha  complicating 
factors  which  hinder  tha  gains  and  savings 
may  contribute  to  tha  antique  modes  of 
databaaa  mapagament  of  conventional  soft¬ 
ware  systems.  T«  this  and,  tha  paper 
attempts  to  Isolate  these  factors  and 
identify  tha  modes  of  operation  for 
consideration. 


1.  DI8ICM  GOALS 

Normally,  the  effective  uee  of  e  detebaee 
system  by  a  uaer  requires  the  user  to  he  familiar¬ 
ised  with  the  languages  of  the  database  computer 
system.  There  are  essentially  two  such  langusges: 
the  detsbeae  definition  language  (DDL)  end  the 
database  manipulation  language  (DHL).  DDL  allows 
the  user  (especially,  the  database  administrator 
or  database  owner)  fed  define  the  logical  and  phy¬ 
sical  properties  oif  the  database.  Logical  proper¬ 
ties  of  a  database  4fe  characterised  by  the  data¬ 
base  models  used.  Eor  example,  In  the  relational 

model1 ,  the  logical  properties  of  tha  databaaa  con¬ 
state  of  attributes  and  domains  (of  a  tuple), 
tuples  (of  a  relation),  primary  keys  (to  the  tup¬ 
les)  snd  relatione  (of  the  database).  In  the  hlar- 
2 

urchlcal  model  ,  the  logical  properties  consists  of 
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field  names  end  valuaa  (of  a  segment),  sequence 
fields,  primary  snd  secondary  Indices  (of  seg¬ 
ments),  segments  (of  s  type),  types  (of  s  parent- 
child  relationship)  end  relationships  (of  ths 
dstabess).  Likewise,  there  are  logical  properties 

of  CODASYL  databases^.  By  defining  Intonation 
entities  in  tens  of  logical  properties  of  a  data¬ 
base  modal,  the  user  can  capture  the  Information 
content  in  tha  detebaee  end  make  (eymbollc)  ref¬ 
erences  to  tha  information  entities. 

DDL  also  allows  tha  uaer  (especially,  tha 
databaaa  daalgnar)  to  daflna  tba  physical  proper¬ 
ties  of  tin  database.  Physical  propartlaa  of  a 
databaaa  ara  thoaa  which  daal  with  units  of  stor- 
aga  (say ,  numbar  of  ragas  and  paga  alas),  kinds  of 
storage  (e.g.,  noVlng-haad  disks  va  flxad-haari 
disks),  storage  formats  of  the  logical  entitles 
(directory  format  for  Indices,  pointers  for  re¬ 
lated  tuples  or  segments  and  encodings  for  re¬ 
peated  attributes  or  field  namea)  and  ecceat  nodes 
(e.g.,  ecceee  by  direct  address  calculation,  via 
intenadlate  records  or  by  way  of  directories). 

Because  modern  databases  are  meant  to  be 
shared,  the  detebaee  ayetam  must  provide  concur¬ 
rent  access  and  multi-user  operations.  DDL  of  a 
modarn  databaaa  aystam  must  thsrtfora  provlda  a 
means  to  allow  tha  databaaa  owner  (or  adsdnlstrs- 
tor)  to  authorise  and  validata  certain  usars  of 
hla  databaaa,  daflna  dlffarant  portions  of  tha 
aatabas#  for  dlffarant  uaara  (a.  g.,  by  creating 
dlffarant  viawa  of  tha  sans  databaaa),  spaclfy 
tha  typas  of  control  oparatlons  permitted  or  de¬ 
nied  on  the  authorised  portions,  end  place  proce¬ 
dures  (e.g,,  programs  written  by  the  administrator 
or  owner)  at  tha  points  of  accass  paths  to  hla 
databaaa  (say,  at  aach  fils  opanlng  time). 

On  tha  othsr  hand,  tha  databaaa  manipulation 
languaga  (DHL)  la  primarily  conesrnad  with  the 
specification  of  asaroh,  ratrlaval,  updata,  and 
processing  rsqulramsnta  of  tha  databaaa.  Bacause 
tha  uee  of  data  models  enables  tha  Information 
contant  to  ba  capturad  In  tha  databaaa,  tha  modarn 
DM.  enables  ths  uaar  to  address  tha  databaaa  by 
contant  for  starch,  ratrlaval,  updata  snd  proces¬ 
sing  oparatlona.  Contant-addreaalng  la  accom- 
plished  In  DHL  as  axprattlona  of  pradlcataa.  For 
axampla,  tha  following  la  a  simple  expression  of 
thrat  praclcatsa,  namely,  a  conjunction  of  an 
equality  predicate,  an  Inequality  predicate  and  s 
greatar-than  pradlcate. 

(Typs-EKELOYEE)  A  (Emp-Dapt  -  TOY)  a  (Salary 
>  20,000)  which  (pacifist  those  records  of  ths 
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employees  who  *r«  not  In  eh*  toy  department  and 
have  ••lari**  greater  than  20,000.  By  raftering  to 
•pacific  ettributss ,  providing  tha  nacaaaary 
predicates,  aad  apacl tying  tha  lntandad  oparatlon* 
in  DM.,  tha  uaar  eaa  manipulate  tha  datahaa* 
affectively  at  variona  graauXarltlaa  of  tha  data- 
baaa  (1.*.,  at  fiald  or  attribute-value  pair 
I  aval r  tup la  or  aagaant  laval,  ralatlon  or  segment 
typo  laval,  and  ralat^onahlp  laval). 

Tha  goal*  of  high-level  laaguag*  databaa* 
machine  daalgaara  ara  tharafor*  to  b*  abla  to  com 
up  with  high-performance  and  great-capacity  con* 
putar  architectures  which  alloy  direct  axacution 
of  DDL  and  DHL  statements  of  eh*  uaar  .application 
prograaa.  Direct  execution  of  uaar  program*  en¬ 
able*  the  performance  and  capacity  gain*  of  the  new 
auichln*  to  b*  contributed  to  the  uaer  (n  tana  of 
hlgh-volun*  management  and  quick  response  vhlch  arc 
difficult  to  achieve  in  conventional  softvare-lnden 
computers  for  vary  large  database  applicotlon* . 

This  difficulty  is  due  largely  to  the  fact  that 
conventional  computers  ara  not  designed  specially 
for  databaaa  management.  Consequently,  very  elabo¬ 
rate  software  for  databaa*  management  must  be  sup¬ 
ported  on  the  computers.  The  execution  of  very 
complex  and  elsaabla  databaaa  management  software 
tends  to  deplete  system  resources  and  provides  in¬ 
adequate  response*  to  user  applications. 

Can  va  design  direct  execution  database  com¬ 
puters?  In  othar  words,  ere  there  complications 
in  reaching  our  design  goal*? 

2.  ISSUES  COMPLICATING  DIRECT  IXFOUTIOW 


There  *r*t  at  leant  two  Issues  which  have  enm- 
plicatod  the  design  goals  of  direct  execution 
datnhase  computers.  Ono  issue  is  related  to  DDL: 
the  other  le  concerned  with  DHL.  These  two  issues 
nay  render  the  direction  execution  of  DDL  and  DHL 
statements  for  conventional  database  management 
application  lnsffectiv*. 

Tha  most  illustrative  way  to  study  thase  com- 
pllcationa  la  parhapa  by  focusing  our  attention  at 
a  specific  databaaa  sodal  and  a  certain  high-level 
■language  databaaa  computer  design.  Hera,  va 

focua  on  tha  hierarchical  model2.  We  choose  the 
DDL  and  DHL  of  IBM's  Information  Narnpement  fvstep 
4-7 

(IMS)  for  study  .  Presently,  IMS  Is  a  vldelv 
used  hierarchical  database  management  software 
system.  For  detabeee  computer  herdwsre  designs  ,  we 
choose  the  database  computer  (DBC)  which  has  been 
8  9 

proposed  '  to  support,  omong  other  database  models 
the  hierarchical  database  model  of  databases. 
However,  much  of  the  findings  produced  in  the  fol¬ 
lowing  sections  are  valid  for  other  models  and  ma¬ 
chines  which  although  not  elaborated  here,  can  be 

found  in10'11’12-13. 


rxecutlon  of  DDL  Statements  for  Creating 

New  Databases  c' 


Directly  executable  DDL  statements  for  hier¬ 
archical  databases  must  be  available  so  that  given 
the  logical  properties  of  a  hierarchical  database, 
the  DDL  statements,  upon  execution,  can  autometic- 
ally  generate  the  physical  structure  of  the  data- 


baa*  for  storage.  Furthermore,  the  physical 
structure  generated  must  take  full  advantage  ot 
tha  strong  points  and  new  capabilities  of  the 
database  computer . 

Let  us  tevlmm  briefly  the  Logical  propartlt* 
of  an  IMS  database  and  preaent  a  (bardwsra)  trans¬ 
formation  algorithm  (oa  dooignod  for  DBC)  which 
converts  the  logical  organisation  of  on  IMS  data¬ 
base  into  a  physical  structure  for  database  com¬ 
puter  storage.  We  will  aleo  mention  briefly  acme 
strong  points  and  now  capabilities  of  the  databaa* 
computer. 


,amSgPiafeHH*S^.r 

hierarchically  related  sermarnt  occurrences  (or 
simply,  segments),  each  of  wtjich  beloegs  to  a 
segment  type.  In  the  example  Figur*  1,  segment 
type  A;  the  'root  segment  ho*  three  occur¬ 

rence*.  All  other*  ere  debjli^ont  segment  types, 
each  having  a  unique  patent'  begmsnt  type  end  sero 
or  more  child  eegment  types.  Soma  relationships 
among  the  various  segments  in  our  examples  are: 

A1  le  the  peroat  ot  B1  and  Gl. 

HI,  H2  end  II  or*  children  of  01. 

.T1  and  J2  are  twin*. 

HI,  H2,  II,  Jl  end.  J2  are  deecendsnte 

nr  dependents  of  fll 

A1 ,  ni  and  11  ere  sacs* tore  of  Jl. 

SiicroNstvp  levels  ere  numbered  such  that  a  root 
r.op.mont  fa  nt  level  1.  All  segment  occurrences 
nro  mode  of  ono  or  more  fields. 

An  IMS  database  Is  traversed  in  tha  order: 
parent  to  child,  front  to  back  *»Ong  twin*  and 
Inf t.  to  right  among  children.  The  traversal 
sequence  for  the  database  of  Figure  1  ie(Al,  gl, 
Cl,  Dl,  D2,  D3,  F.l,  *1,  E2,  F2,  «,  Ol,  HI,  H2,  11, 
.11,  J 2 ,  A2,  A3).  Notice  that  the  traversal  se¬ 
quence  define*  s  next  segment  with  reepect  to  a 
given  segment,  A  hlersrchlaal  oath  1*  *  sequence 
of  segments,  on*  per  level,  starting  at  tha  root, 
e.g.,  (Al,  01,  II,  J2) . 

2.1.2  Automatic  Generation  of  Storage  Struc¬ 
ture.  An  IMS  database  with  the  above  logical  prop¬ 
erties  can  he  defined  in  DDL  stateeunt*  which  upon 
execution  transform  the  database  into  proper  stor¬ 
age  format  of  the  database  computer  (1.*.,  DBC). 
Because  DBC  does  not  address  physical  records  by 
locations,  location-dependent  pointer*  are  not 
imrd  by  DBC  for  the  purpose  of  facilitating  hier¬ 
archically  related  record*.  Instead,  physical  re¬ 
cords  are  content-addressed  by  DBC  provided  that 
the  content  of  a  physical  record. le  presented  as 
ono  or  more  variable  length  attribute-value  pairs, 
known  as  keywords.  Thus,  an  IMS  database  is  trans¬ 
formed  by  considering  every  IMS  segment  e*  a  phys¬ 
ical  record  (or,  a  imply,  record)  cosqioaed  of  key¬ 
words  . 

An  IMS  segment  includes  a  sequence  field 
whenever  It  la  necessary  to  indicate  the  order 
among  the  twin  segments.  Since  each  segment  be¬ 
comes  a  record  and  no  address-dependent  pointers 
are  allowed,  the  detabeee  computer  assign*  a  sym¬ 
bolic  identifier  to  each  segmant,  identifying  it 
uniquely  from  ell  other  segments  in  the  database. 


93 


ike,  l-,;: 


•i  id  .illo"  tin-  use1'  to  ■  nivlfv  1 ogical ly  a  hier- 
irehicitl  dit  alase  am  nri-vlco  nutomat ic  Sene r.n t i o„ 
1  tlir  dataK;?..,  can  be  re.-idil'.-  realized  In  the 
.ird'-arc  an*'  be  "repented  .lirectl’'  to  vield  a  *-e-l- 
.  t  ion  of  nhvsir  >1  recoids  of  kevcords  for  mmi- 

. .  Ki'Wi*rHs  enable  the  database  computer  (nn'i 

to  content  •  i  ' *’»•■••  i*t  ’..ovd‘1  Ip  database-  >'  " 

•  I'otaln  t'  '•  e  -vr-rds .  Thus,  the' hardware  real 

•  ration  of  I'M  t  rat  orients  Indeed  utilizes  r'ae 
•itvnnp  no  Inis  aid  no"  fant-IHt*  of  the  ditaMs.* 

•  .'fipiit  or . 

Poi-eve'-.  i*  ipm,i i*i  the  Invents  of  t*|. 

md  I,  wo  n*  t -■  fii.it  the  leii  of  r'-’mbollr  1''eo!  1  - 
tiers  te  capture  the  parent -rhi  Id  rclat  lonsh  I ->s 
•".o' Increase  t *i %  -timee  ri  'ml rprippt  of  the  n'e-r  I 
i  a!  record-  .  ''nrthermot ■  ,  as  the  levels  of  *• 
hlofafehv  dovoloo  the  stc-nre  renul rorr.ent  of  the 
nhvsicn)  records  ra'‘  Increase  "exponentially " . 
'this  is  evident  b"  the  following  observation  that 
iff  each  rori  i’i.tnindl up  di  pendent  segment  the  niv- 
sleal  roociid  no  t  lot  ltn'i  additional  storage 
-.  >.tct  for; 


Fiont 


1,  logical  Organ! station 
ur  III?  f'ntahase 


The  symbolic  Identifier  of  a  uegmenf  ?  I  .  *•  -ron: 
of  fields  conaistinp,  oft 

tl)  tint  itymhnlle  Identlfiet  of  i  lie 
parmit  of  F,  and 

t.’l  the  sequence  field  of  ft. 

•'’Inei  i  lie  sequence  fields  of  different  . . 

ivpeH  may  use  the  same  field  nnme,  we  rut  "s.ill  ■ 
the  field  name  with  the  segment  tvno. 

The  creation  of  a  record  from  an  P'S  "o>-e,on t 
can  now  In-  spppr  'll  shed  hv  formlii»  l-e-t.-oril ot 
lot  town -. 

(1)  For  each  field  In  the  segment . 
form  a  keyword  using  the  field 
name  as  the  attrlhute  and  If  el.' 
value  as  the  value. 

(2)  Form  a  keyword  of  the  form.*.  P’tT, 
septvpe  > where  TYFF.  is  a  literal 
and  segtype  Is  the  segment  ivu* 
in  consideration. 

f  )T  For  each  sequence  field  In  tin- 
symbolic  identifier  of  the  sen.-- 
tnent,  form  a  keyword  us  I  no  lie 
field  nnme  (qualified  bv  the 
segment  type)  as  the  attribute 
and  the  field  value  an  I  lie  v.i !  •.»  . 

For  example,  for  on  IMS  database  shown  in  i-'lp.m-. 

.  the  att  ribute  templates  of  the  five  collect  l.ni't 
of  records  corresponding  to  the  five  segment  tvpes 
ate  shown  in  Figure  3.  Qualified  field  names  such 
as  Prereq.  Course  I1  are  used  to  distinguish  tin 
same  field  names,  l.e.,  Course  f,  among  dlffeteitt 
segment  tvpes, 

2.1.1  execution  (lain  vs  Storage  jVnnlt v • 
hue  to  the  simplicity  of  the  transformation  al.-er* 

I  thin,  It  is  not  surprising  that  Pill,  statements. 


If  qti.-tl  I  f  lent  t  -ns  i  f  the  field 
names,  and 

.  ..quonoo  fields  tf  Its  ancestor-. 

I.  r  c-  ntnpl  .  Ic  1  cure  3  -t  nlivsicjl  student,  rpr- 
-  t  V  .it  IcVi  ’  t!  leu  must  .ircorndote  the  quail  f  lid 
•’..me.  Student.  I  "n-  " .  la  addition,  the  student 
i  o.-ord  also  i*«io  t  :m  .iide  the  sequence  field  M  .,*, 
tie-  d.ite  rid. tl  o'  its  imrunt  (l.e.,  a  certain 

•  i  f-i  Itir  re.  'i  as  a  kcos.oi'd.  Since  the  -utri-iit 
i-  *i  child  oi  .  irtaln  course  record  whose  s.  r. 
i  .  ii.-i  '  ielr  I-.  .curse  mimbut  (l.e.,  Course  -Q  , 
t'-i-te  Is  a  1- . 'Vi* ii I-.!  (p  the  student  record  v.'liise 

•  l  i  i'll, ut  i-  '•!  Court e  •'  .  T'o  Inclusion  of  course 


'  ‘3"  ,  I 

Vi.,,1'.  j  "Hi  |Ui"  I  rl  io.J 


•  i  i  ret| 
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of  fol'ili£_  _  _ - 

1  1 1  |  P  'tPotc  I'l  OC  at  ion  I  Format] 

-Undent 

I  'ipi.i  ;  o.ui'  ■  *’f'f r.-.p  ?'  1  vame'  nr«iU- i 


:•  ii m  »’« 


l,:  i  .  i  i  i  t *;  ' 

t.  .  ,.r  i  i  1 1  .*»•}*..»» i  ftat  : '  n 

,  .n.  ,?“•  I-.. i 


i  •  u**’l  »•  *  r  ii ,  dar**'  ,  ni*l  ;u/i  l  I r  i  r.i  t  I  nils  tn  the  stuiVi.- 
t*ci'i*rilH  inr  r»  I  *•  **  «tor,i"i'  requi  romont  of  tK 

M  erurrhf  r.il  <l;i t i  oiv<  I  del  ah  1  v .  On  the  * » t 1 ■  i 
hand,  tMi'  inrlutffnn  nf  ^emienre  f i l»  1  ii «  as  knv\  ovds 
in  rveordH  olirMtintoB  the  n i> e •  l  of  pointer  snares 
which  were  neeessurv  tn  the  U,s‘  sepments  for  the 
purpose  of  link lnt1  all  the  twins  of  a  River,  part-tit 
sequent  ial  1  •* .  Di-'-.pi to  auch  trade-off  of  apace'*. 

nnalvsis  hns  shown1 ^  that  the  increase  may  ho  '1'* 
per  1  **Vi*  1  start  1  u»>  at  level  4.  Similar  findings 
nn  stump, e  l»»ari  due  to  now  database  machine  re- 
nulrcment  are  obtained  in  relational  as  well  as 
COIl A  S Y 1  modelled  •’  a  t  a b a  s  e  s  . 
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Identifier  l*  under  lined 

ruuva  3.  The  of  Phv#ic»;  Jerorde  ?f 

the  fee^eett  of  ridure 


It  is  not  clear  whether  It  le  possible  to  devise  a 
hardware  transformation  algorithm  which  la  aa  sim¬ 
ple  aa  the  one  mentioned  above  and  which  can  yield 
storage  gains.  Until  such  an  algorithm  la  found, 
direct  execution  of  DDL  statements  for  database 
creation  In  the  now  database  computer  environment 
may  actually  cauaa  a  loss  in  storage. 

2.2  Direction  Execution  of  DHL  Statements 

for  Database  Transformation 

In  IMS,  the  database  manipulation  language 
(DM1,)  statements  known  as  DL/l  calls  have  the  fol¬ 
lowing  format  < 

Operation  list 

where  the  Operation  1b  one  of  Insert  (ISRT),  delete 
(DIET),  replace  (REPL)  and  get  (GET)  calls,  and 
where  the  llat  la  a  number  of  segment  search  predi¬ 
cates,  at  swat  one  per  level,  which  are  used  to 
saloct  a  hierarchical  path.  Each  segment  search 
predicate  is  preceded  with  the  name  of  the  segment 
type.  Let  ns  denote  the  segment  search  predicate 
at  level  i  as  Si. 

After  each  retrieval  or  Insertion  operation,  a 
segment  la  "Mtebllshed"  In  the  traversal  sequence 
of  the  XM  database.  For  a  retrieval  operation, 
thla  segment  refers  to  the  oagment  just  ratr laved; 
for  an  lnaertlon  operation,  this  segment  refers  to 
the  segment  just  inserted.  Such  a  segment  in  the 
traversal  sequence  la  termed  the  current  pottlon  in 
the  database.  There  era  set eral  forms  of  the  get 
call,  each  of  which  ratums  a  single  segment.  A 
set -unique  (CU)  call  retrieves  a  specific  segment 
et  level  n  by  starting  at  the  root  segment  type, 


finding  tli<*  first  segment  at  each  level  t  satisfy¬ 
ing  ,  and  finally  retrieving  the  segment  satiefy- 
ing  rt/ .  A  get-next  (GN)  call  start*  the  search  at 
the  current  position  in  the  database  and  proceeds 
, along  the  traversal  sequence  satisfying  si  for  all 
i  and  retrieving  the  segment  satisfying  Sn. 

We  shall  Illustrate  the  manner  In  which  get- 
unique  (GU)  and  get-next  (GN)  calls  are  executed 
by  the  database  computer.  Referring  back  to  the 
IMS  database  of  Figure  2,  let  us  suppose  that  the 
DL/l  call  to  be  processed  le: 

GU  Course  (Title  -  'MATH') 

Offering  (Location  -  'CAMBRIDGE') 

Student  (Grade  "  'A') 

This  asks  for  the  first  Student  segment  of  the 
database  which  satieties  the  predicate  Grade  “'A', 
and  which  hat  a  parent  segment  Offering  with  Loca¬ 
tion  "  'Cambridge'  whore  parent,  in  turn,  is  a 
Course  segment  with  Title  »  'MATH'.  The  call  ia 
executed  as  follows: 

(1)  Starting  with  the  first  segment 
search  predicate  l.a.,  Title  <• 

'MATH',  the  Course  segment*  which 
satisfy  the  predicate  are  re¬ 
trieved  by  utilising  the  query 
formulated  by  the  machine 

((Type  -  COURSE)  A  (Title  ■  MATH)) 
and  are  sorted  by  the  machine 
according  to  the  value  of  their 
sequence  field,  1.*.,  by  the  at¬ 
tribute  Courae  #. 

(2)  If  no  Courae  segment  exists,  then 
the  DL/I  call  la  unsuccessful. 
Otherwise,  the  first  Course  seg¬ 
ment  la  found  and  designated  as  the 
currant  Course  segment. 

(3)  The  Offering  eagsMnta  are  than  re¬ 
trieved  with  the  predicate  Location* 
'CAMBRIDGE'  and  sorted  by  their  se¬ 
quence  field,  t.e.,  date.  If  the 
sequence  field  of  the  current  Course 
segment  ia  (Court*  #,  C) ,  then  the 
query  used  by  the  machine  for  this 
content-addressing  la 

((Type  -  OFFERING)  A  (Courae  #  -  C) 

A  (Location  «  CAMBRIDGE)). 

(A)  If  no  Offering  segment  exlste,  then 
the  current  Course  segment  la  re¬ 
moved  and  contmol  la  translated  to 
Step  2.  O'.harwlae,  the  first  Of¬ 
fering  segment  ia  designated  aa  the 
currant  Offering  segment. 

(3)  The  Stud*  r»t  segments  are  than  re¬ 
trieved  with  predicate  Grade  •  'A' 
and  eorted  by  their  sequence  field, 
l.a.,  by  Bap  I.  If  the  sequence 
field  of  the  currant  Course  seg¬ 
ment  la  (Course  #,  C)  and  that  of 
the  currant  Offering  segment  la 
(Date,  D),  than  the  query  uaad  by  the 
machine  for  this  round  of  content- 
addraacing  1* 

((Type  *  STUDENT)  A  (course  #  -  C) 
a  (Date  »  D)  a  (Grade  "A)). 


v  .  ..  *  _  •‘S'.f/iitV-jJ., 


Transaction  Requirement: 


(6)  If  no  Student  segment  exists, 
than  the  current  Offering  seg¬ 
ment  in  removed  and  control 
is  transferred  to  Step  4. 

Otherwise,  the  first  Student 
segment  le  designated  as  the 
current  Studenc  segment , 

H)  The  DL/1  call  is  successfully 
executed  and  the  current 
Student  segment  is  returned. 

It  should  be  noted  that  at  this  point  that 
the  content  of  the  work  apace  of  the  machine  es¬ 
tablished  by  the  above  GU  call  may  be  used  to  ex¬ 
ecute  the  next  DL/1  call,  for  example,  to  retrieve 
the  next,  student  who  has  an  A  grade  in  a  math 
course  offered  in  Cambridge.  Thia  is  depicted  by 
the.  following  get-next  (GN)  call: 

GN  Course  (Title  •  'MATH* ) 

Offering  (Location  *  ' CAMBRIDGE ' ) 
Student  (Grade  •  'A') 

In  this  case,  the  relevant  segment  may  already  be 
present  in  the  work  space  of  the  Machine.  The 
current  Student  seguent  is  removed  end  control  is 
transferred  to  Step  6  given  for  the  GU  call. 

On  the  other  hand,  if  the  GN  cell  is: 

GN  Course  (Title  «  'MATH') 

Offering  (Location  -  'CAMBRIDGE') 
Student  (Grade  »  'F') 

then  only  existing  Course  and  Offering  segments 
may  he  used.  However,  it  le  necessary  that  the 
next  Student  Regmapt  returned  should.  not  precede 
the  current  Student  eagmsnt  in  the  traversal  se¬ 
quence.  Hence,  if  the  sequence  field  of  the  cur¬ 
rant  Student  segment  le  (Bmp  #,K),  that  of  the 
current  Offering  segment  le  (Date,  D),  end  that  of 
the  current  Couese\sngmant  is  (Couree  #,  C),  then 
the  following  machine  query  is  used  for  content- 
addreaaing  the  next  set  of  Student  segments: 

((Type-STUDENT)  A  (Couree  #-C)  a  (Dat#-D)  a 
(Kmp  (  ;  E)  *(Gred««F)) 

The  previously  existing  Student  segments  are  re¬ 
moved  and  control  is  transferred  to  8tep  6  given 
for  the  GU  call . 

Finally,  if  the  GN  cell  is 

GN  Cou«e  (Title  -  'HI8T08T ' ) 
offering 
Stu4«nt.. 

then  uo  currently!;. existing  segments  ere  netful. 
Hence,  new  set*  of  ^agpente  oust  be  retrieved,  one 
set  for  each  level*  ■  ■ 

2.2.1  tftyfrTjwm.fotei  JWksaUissL  ?M*s± 

foyiutioj.  Iron  gne  dbevs  discussion,  it  is  not 
surprising  »q  leant. that  directly  executable  data¬ 
base  manipulation  (DM.)  statements  of  the  follow¬ 
ing  types  of  transaction*  will  produce  the  "beet” 
performance  for  the  database  computer  over  the 
conventional,  scf tware-laden  IMS  system. 


(1)  Find  all  segments  satisfying 
given  predicates. 

(2)  The  predicate  at  the  root  level 
does  not  Involve  the  sequence 
field. 

(3)  No  predicate  is  given  at  any 
Intermediate  level. 

Example:  Find  all  those  students  who  felled 
a  mathematics  course  regardless  of  the  location  at 
which  the  course  was  offered. 

fill  course  (Title  •  'MATH') 

Offering 

Student  (Grade  ■  'F') 

Loop  GN  Course  (Title  *  'MATH') 

Offering 

Student  (Grade  *  * F * ) 

GO  TO  Loop 

Let  N  be  the  number  of  root  segments  (i.e., 
courses).  All  of  the  root  segments  satisfying 
the  predicate  are  content-addressed ,  For  each  of 
t^eae  root  segments,  all  y  of  its  third  level 
twins  satisfying  the  predicate,  are  then  content- 
addressed.  We  also  assume  that  these  third  level 
segments  (l.e.,  those  students  who  received  grade 
F)  are  scattered  evenly.  The  relative  performence 
Is  charted  in  Figure  4.  The  entries  of  the  chart 
are  computed  as  the  ratio  of  page  accesses  (to  IMS 
segments  in  the  old  ecf tvare-ladeu  environment)  to 
block  acceeses  (to  physical  records  in  the  new 
database  computer  environment) . 

Due  to  very  large  content-addressable  block 
si*e  (approximately  1/2  raagabytes)  and  ralatively 
small  sequential-addressable  page  ales  (about  2 
kbytes) ,  thia  type  of  transection  may  yield  one  or 
two  orders  of  magnitude  of  performance  gain  over 
the  conventional  system. 

2,2.2  Where  ere  the  Performence  Caine?  Now 
let  us  consider  another  type  of  transaction  as 
follows : 

Transaction  requirement  — 

(1)  Find  e  eingle  segment,  satisfying 
the  given  predicates. 

(2)  A  predlcste  involving  the  se¬ 
quence  field  is  given  at  root 
level. 

Example ;  Find  the  student  with  employee 

number  SO,  taking  a  Clg  211  course 
In  Columbus.  We  note  that 
couree  mimbare  ere  sequenced . 

GU  Couree  (Ccuri*  #  ■ 'CIS  211') 
Offering  (Location  * 

'Columbus') 

Etudett  (Imp  t  -  SO) 

The  performence  gains  of  this  type  of  transaction 
are  charted  in  Figure  5,  It  is  disappointing  to 
note  that  the  performance  of  th*  database  computer 
for  thia  type  of  transaction  le  not  much  better 
then  the  conventional  software-laden  system. 
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2.2.3  Psrtormancs  Gains  v»,  Transaction 
Typo.  By  comparing  tha  example*  presented  in  the 
pravlous  two  sections,  it  ia  evidant  that  tha  new 
hardware  of  tha  databoe  computer  will  not  yield 
slgniflCMtly  batter  performance  over  the  aoftware 
eye tea,  if  tha  usar  transaction  demand  records  in 
a  sequential  Banner  and  recaiva  them  one  record 
at  a  time.  On  tha  other  hand,  if  for  a  uaer 
transaction,  tha  daaaivf  ia  of  high  volume  and  the 
aaarch  criteria  of  tha  demand 


ara  made  of  predicates  which  require  content-* 
address ing  Instead  of  sequential  accessing,  than 
the  strong  points  of  tha  database  computer  herd-  . 
ware  cm  lndaad  yield  high  performenc.  Ideally 
one  would  want  to  coma  up  with  a  design  of  high-  . 
performance  and  great-capacity  database  computer  . 
which  cm  provide  affective  end  efficient  soBur 
tlona  to  either  lorn-volume  and  emqueatlal  database  I 
manipulation  or  the  High -volume  and  com tent-ad-  , 

dresaable  database  manipulation.  Such  4" design  ia: 
not  in  sight. 

3.  cowcludiwg  mmots 

Direct  execution  of  Misting  high-level 
databMe  definition  and  manipulation  language  con¬ 
structs  nay  not  be  desirable.  Tha  underelrablllty 
is  due  to  the  lack  of  good  database  cosputsr  de¬ 
sign  for  uniform  gains  in  storage  requirement  and  I 
transection  execution.  In  other  words,  epaclal- 
purpoae  database  computers  may  not  be  able  to 
bring  about  the  high  hope  of  Mticlpated  through-  i 
put  gains  which _bm*  bean  tha  dMlgn  goal  of  tha 
databMe  'computers  In  the  first  place. 

Nevertheless,  database  computers  which  ara 
capable  of  directly  executing  database  definition 
Md  manipulation  language  constructs  will  stay. 

Thair  impact  will  be  twofold.  First,  tocebasa  ap- 
V  licet ion  programming  will  change.  Vhe  rhanga 
will  primarily  ba  prompted  by  the  advcncm  fea¬ 
tures  provided  by  tha  vvchlnau  which  ari:  not 
otherwise  adeuately  avail. ''la  In  conventional  soft¬ 
ware  aye  tame,  for  example,  security  md  Integrity 
check*  and  concurrency  controls  can  be  made  more 
effectively  and  efficiently  Introduced  as  hardware 
mechanisms .  The  use  of  hlgh-voluas  md  content- 
addressable  aaarch  and  update  for  vary  large  data¬ 
bases  la  another  need  for  hardware  realisation 
These  advanced  features  will  allow  existing  data¬ 
bases  to  mlgrats  tv  a  new  databMe  machine  mvlr- 
onmant  with  newly  written  application  programs.  On 
tha  other  hind,  there  la  not  much  that  tha  new 
machine  cm  Improve  for  tha  old  ■  application  pro¬ 
grams.  However,  with  some  Interfacing  software, 
the  existing  application  programs!  can  etlll  ha  run 
on  the  new  environment  without  the  need  of  program 
conversion.  It  Is  hoped  that  In  the  long  run  the 
database  application  will  ba  dominated  by  the  newly 
vrltten  application  programs. 

Secondly,  the  presence  of  the  database  mach¬ 
ine*  will  have  an  Important  intact  on  tha  futura 
development  of  databMe  definition  and  manipula¬ 
tion  languages.  Deeplte  thair  claim  of  data  In¬ 
dependence  (l.e.,  devoid  of  databMe  software  and 
hardware  Implementation  issue*),  tha  languages 
were  designed  with  certain  known  processing  modau 
and  underlying  technology  of  tha  time.  As  a  new 
technology  with  a  high  degree  of  parallelism  and 
content-addressability,  tha  detabMa  computer  will 
require  new  database  definition  and  mmlpulatlon 
languages  to  ba  highly  concurrent  and  associative. 
Furthermore,  the  new  languages  should  have  an  in- 
grated  approach  to  tha  specification  and  control 
of  security  and  integrity  checks  of  databMe  access 
and  update.  Thus,  the  study  of  databMe  computer  1 
design  will  also  prompt  our  investigation  of  now 
DDL  and  DHL  for  the  computers. 
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An  architecture  of  Implemented  hashing 
hardware  to  be  used  In  symbol  manipulation  Is  pre¬ 
sented.  The  major  components  of  the  hashing 
hardest*  are  a  hash  addressing  unit  and  hash  table 
msmail,**  which  can  also  be  used  as  main  memory  of 
the  system.  The  hardware  makes  use  of  parallel 
read-eut  and  comparison  mechanisms  of  loglc-ln 
memory  banks.  Basic  hashing  algorithms  such  as 
search,  Insertion  and  deletion  of  keys  are  real¬ 
ised  by  microprogram  control.  Performance  Im¬ 
provements  of  ranging  g  -  13  times  are  obtained 
«v«r  pars  software  hashing.  The  application  tech¬ 
nique*  *f  hashing  hardware  to  symbol  table  manipu¬ 
lation,  property  list  handling  and  set  operations 
are  given.  The  advantage  of  hashing  over  associa¬ 
tiva  memories  In  theee  applications  are  also  dis¬ 
cussed. 

1.  Introduction 

Hashing  plays  an  Important  role  In  speeding 
up  table  look-up  operations.  It  Is  extensively 
used,  not  only  in  the  traditional  Language  trans¬ 
lation,  l.e.  assembling  and  compiling,  but  In 
uymbol  manipulation  at  large,  e.g,  formula  mani¬ 
pulation  ,  execution  olj  a  Lisp  dialect  ,  and 
associative  processing  , 

Although  hashing  is  the  fastest  among  known 
methods  in  the  table  searching  of  H  items  In  terms 
of  computational  complexity  (  0(1)  compared  with 
0(log  It)  of  binary  search,  for  example),  a  con- 
etamt  time  factor  due  to  calculation  of  hash  ad¬ 
dress  sequences  le  not  email  In  software  hashing  and 
in  soma  caaaa,  hashing  gives  way  to  alternative 
techniques.  Moreover,  to  avoid  rapid  degradation 
of  the  performance,  the  table  utilization  must  be 
limited  to  far  lass  than  Chat  of  the  total  capaci¬ 
ty,  eay  70-80  Z. 

T*  overcome  thas*  difficulties,  we  proposed 
parallel  hashing  schemas  in  which  n  Independent 
hash  address  sequences  ere  used  to  access  ■  hash 
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table  organized  as  e  b  by  P  two-dimensional  stray 
(b  columns,  to  be  celled  memory  banka,  are  ac¬ 
cessed  In  parallel) {n<2>)  (cf.jFlg.  1),  and  pre¬ 
sented  performance  analyses.  * 5  The  rasults  of 
the  analytes  assured  us  of  the  average  execution 
time  of  less  than  1.18  euccesaful  table  look-ups 
with  n-b* 4,  or  even  1.05  with  n“b« 32  until  the 
load  factor  of  the  table  gate  aa  high  a a  0.9. 

Based  on  the  analytes,  we  realised  a  parallel 
hashing  scheme  on  an  experimental  system,  to  be 
used  for  symbol  manipulation.  In  section*  2-5,  wa 
discuss  the  architecture  end  the  performance  of 
the  implemented  system. 

The  fact  that  basic  hath  table  look-up  opera¬ 
tions  can  be  done  with  speed  comparable  to  single 
Indirect  addreeelng  encourages  more  extensive  use 
of  hashing  In  new  arena  of  applications.  In  sec¬ 
tion  6,  we  explain  how  several  important  algor¬ 
ithms  in  symbol  manipulation  ar*  speeded  up  by  the 
hashing  hardware. 

2.  Initial  Design  Considerations 

Our  problem  domain  Is  symbol  manipulation 
where  tables  (date  bases)  to  be  searched  are  taken 
in  mein  memory  and  accessed  by  hsehlgg  elgorlthae 
such  as  given  In  chapter  4  of  Knuth. 

Our  approach  Is 

(1)  to  build  Into  mamory-CPU  Interface  parallel 
mechanisms  of  (hash)  addressing  and  data  (key) 
comparison, 

(2)  to  incorporate  hardware  logic  to  compute  hash 
addressee  into  the  address  formation  unit  In 

CPU, 


and 

(3) 


to  replace  the  hashing  control  sequencing 
(traditionally  done  by  software)  by  faster 
logic,  l.e,  microprogramming. 


Several  variations  of  hashing  algorithms  ere 
known  with  regarde  to  key  collision  end  deletion 
handling,  apart  from  the  choice  of  hash  functions. 
We  summarized  below  our  considerations  on  tha.tuo 
Issues.  For  detailed  discussion,  aa*  papers.  ’ 


L<\ 

T 


Open  addressing  vs.  chaining  methods  for  collision 


Etspiutlya 


•  When  bits  required  for  chaining  are  rightly 
taken  Into  account,  overall  performances  of 
the  two  are  nearly  equal. 

-  The  open  method  is  more  amenable  to 
p.'irnl lcliam  of  memory  accesses  than 
elimi  living . 

Hence,  the  open  addressing  method  is  selected  for 
our  Implementation. 

With  or  without  key  deletion 

•  Traditional  apnllcatlon  of  hashing  such  as 
symbol  table  manipulation  in  language 
translation  may  not  raqulra  handling  of  key 
deletion,  since  a  symbol  table  is  discarded 
as  a  whole  whan  compilation  (or  assembling) 

Is  over. 

•  However,  in  the  advanced  application  to  be 
discussed  In  section  6, 

key  deletion  handling  is  indispensable. 

•  Among  the  key  deletion  algorithms  based  on 
the  open  addressing  method,  an  efficient 
method  developed  In  [7]  requires  extra 
hardware  resource  in  memory  (collision  number 
counters  in  each  memory  word). 

•  In  our  implementation,  it  is  expensive  to 
Incorporate  extra  bits  in  each  word 
without  losing  the  compatibility  with 

Ulio  turget  computer  architecture. 

The  above  considerations  lead  us  to  adopt  a  key 
deletion  algorithm  which  makes  use  of  three  states 
oF  a  memory  word,  i.o.  'deleted'  (all  1),  'empty' 
(all  0)  and  'occupied*  (bit  patterns  other  than 
the  above  two  bit  pattarnn) . 


the  instruction  repertoire  of  the  processor  is 
augmented  with  the  hashing  Instructions  given  In 
Table  1, 


HAU  is  further  divided  Into  three  parts;  hash 
address  generator  (HAG),  hash  code  generator  (HCG) 
and  hash  table  descriptor  unit  (HTDU) ,  as  shown  in 
Fig.  3.  HCC  Is  used  to  generate,  out  of  a  key  k 
bit  patterns  (hash  code)  which  are  then  Input  to 
HAG  for  the  generation  of  a  hash  address  sequence 
( h HAC  Implements  the  following  generation  al¬ 
gorithm  (cf .  Fig.  3)  : 

Let  a  and  be  the  hash  code,  and  P  be  the 
size  of  a  hash  table  (cf.  Fig.  1).  P  should  be 
a  prime  number.  To  generate  h  and  A)l,  we  Use  a 

mask  value  ^  which  satisfies  the  relation 


hQ  <-  a  .n  (2"'-l)  ,  A h  <•  o' a  (2m-l) 
if  h  >  P  h  «-  h  -P 

O  '  0  .» 

if  Ah  >  P,  Ah  i-  Ah  -  P 
if  .Ah  “  0,  Ah  v  1 
for  1 ,  2 ,  . . . ,  P-1 


If  hi  >  P,  h. 


u 


,-P 

t 


HM'a  ar,  realized  by  logic-in-memory  cards, 
each  having  32  k  bytes  of  memory.  They  are  inter¬ 
faced  to  common  bus  (Unibus)  (hence  accessed  as 
main  memory  via  memory  management  unit  (MMU)),  and 
have  following  functions; 

>  parallel  read  operations  of  HM1-MM4  which  uru 
invoked  by  HAU, 

*  pattern  matching  capabilities,  which  detect 
'deleted',  'empty'  states,  and  key  matches. 


The  dificulty  with  thir  algorithm  la  that  the 
'deleted'  words  accumulate  after  repetitions  of 
key  deletions  and  Insertions.  It  causes  degrada¬ 
tion  of  the  performance,  especially  unsuccessful 
searches.  We  need  a  claan-up  operation  of  the 
hash  table;  l.e.  to  reclaim  'deleted'  words  that 
are  no  longer  in  collisions  with  othar  keys  and  to 
turn  them  into  'empty'  state,  relocating  keys,  if 
necessary.  Without  collision  number  counters, 
this  operation  muat  be  performed  with  the  aid  of 
software  (rehashing  all  the  kaya  in  the  table)  in 
conjunction  with  garbage  collection.  The  hardware 
must  have  a  function  for  monitoring  the  perfor¬ 
mance  in  order  to  determine  when  to  initiate  the 
garhage  collection,  however. 


3.  Description  of  the  Heehlng  Hardware 


Figure  2  shows  our  experimental  system  incor¬ 
porating  the  hashing  hardware  unit  (HU),  It  is 
the  implementation  of  the  model  in  Fig.  1  with 
H*1  and  />”4  in  the  case  of  single-length  (16  bit) 
keys.  The  hashing  hardware  consists  of  two  parts; 
hash  addressing  unit  (HAU)  and  hash  table  memories 
(IIM) .  The  conventional  ALU  (16  bits)  is  micro¬ 
program  controlled.  Without  HAU,  the  system  can 
emulate  an  existing  mini-computer  (particularly 
Hulted  for  PDP  11).  With  the  hashing  hardware, 


Hash  table  descriptor  unit  (HTDU)  in  Fig.  3 
contains  236  table  descriptors  and  each  provides 
hash  table  base,  size,  and  tha  other  auxllliary 
information  to  be  used  In  HAG,  microprogram  con¬ 
trol  unit  and  ALU.  The  descriptor  of  each  hash 
table  can  also  be  used  to  ganorata  an  18  bit  ad¬ 
dress  without  the  uae  of  MMU, 

Tho  hashing  control  la  raallxad  by  micropro¬ 
gram  and  its  algorithm  la  diacuased  in  the  next 
section. 


4.  Basic  Hashing  Algorithms 


Given  key  k,  let  kPs  be  the  simultaneously 

read-out  key  from  bank  i,  for  i“l,2, . . .  ,i>. 

We  define  following  Blgnals  to  be  used  in  the 
microprogram  control  unit; 


M  «  y  m2  '  • 1 •  •  mb 

t:  "  M  a  (c':  .  a2  ...  ,  ub) 

U  »  M  A  (d^  '  ^2  *  •  •  •  v»  dp 

where 
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«£  it  tha  result  of  the  r.owitruon  of  k.  and 
'empty'. 

d.  Is  the  result  of  the  comparison  of  k.  and 

t  ?. 

'deleted' , 
end 

Is  the  result  of  the  comparison  of  k^  and  k. 
end  are  generated  in  memory  bank  HMi. 

We  ehould  note  that  the  comparisons  are  per¬ 
formed  In  parallel  and  that  the  results  (M,  E  and 
D)  are  available  Immediately  after  the  completion 
of  the  key  read  one rat iona. 

Aleorlthm  S  (key  eearch) 


table.  Therefore,  the  algorithm  for  HMI  is  only 
to  repeat  the  table  look-ups  until  either  B  or  D 
becomes  true. 

Execution  of  the  hashing  Instructions  is  in¬ 
terrupted  when  the  number  of  table  look-upa  ex¬ 
ceeds  the  pra-specified  value  (atepe  not  shown  in 
the  above  algorithms)..  Counting  the  nuaber  of  in¬ 
terrupts,  the  hashing  software  can  monitor  the 
performance  of  the  table  look-up  operations  of  a 
particular  haeh  table;  thus  we  can  tell  when  to 
invoke  the  clean-up  operation  as  dlscuseed  in  sec¬ 
tion  2.  Returning  from  the  Interrupt  and  restart¬ 
ing  the  instruction  le  performed  by  instruction 
HRTI .  Instructions  on  'virtual*  keys  are  dis¬ 
cussed  in  section  6. 


Instruction  HSR  is  implemented  bv  this  algorithm. 

Stpp  1.  Set  i*0 

StpP  2.  Compute  a  hash  address  . 

Step  3.  Access  the  hash  table. 

(M,  E  end  0  are  available  at  the  end  of  this 
step.) 

Step  A.  It  H  than  return  the  matched  position. 

If  E  then  terminate  the  algorithm. 

(kev  k  does  not  exist  in  the  table.) 
Otherwise,  set  i  *■  {4-1,  and  goto  Step  2. 

The  key  deletion  algorithm  is  similar  to  Algorithm 
St  replace  the  first  line  of  step  4  above  with 
"If  M  then  put  'deleted'  in  the  matched  position". 
Instruction  HSD  is  usad  to  execute  the  deletion 
algorithm. 

The  kev  Insertion  algorithm  which  corresponds 
to  HSI  is  as  follows) 

Algorithm  I  (key  search  and  insertion) 

Step  1.  Set  <4-0. 

Step  2.  Coatpute  a  hash  address  h.. 

Step  3,  Access  the  hash  table. 

Step  4.  If  H  then  tha  algorithm  terminates. 

(Key  k  already  exists.) 

If  Ef\D  then  put  k  in  tha  'deleted' 
position, 

and  terminate  the  algorithm. 
If  E  then  put  k  in  the  'empty'  position 
and  terminate  the  algorithm. 

If  D  than  set  t*-  the  'deleted'  position, 
set  i+4+1,  end  goto  step  5. 
Otherwise,  set  fa-i+l,  and  goto  stop  2. 
Step  3.  Compute  a  hash  address  h. . 

Step  6.  Access  the  hash  table. 

Step  7.  If  M  then  terminate  the  algorithm. 

(Key  k  already  exists.) 

If  E  then  put  k  in  position  t 

end  terminate  the  algorithm. 

Set  iH+1 
Go  to  step  3. 

Instruction  HHI  is  used  to  Insert  a  new  key 
that  is  known  to  be  non-existent  in  the  hash 


Key  types 

The  hardware  haa  to  cope  with  multiple-length 
keys,  since  the  keys  are  often  strings  of  char¬ 
acters,  complex  data  structures,  ate.  The  opera¬ 
tion  of  HU  is  not  affactad  by  tha  atrrlbuta  of  the 
bit  pattern  (dete  type)  other  then  the  length. 

The  bselc  lengths  ere  'single'  (16  bite), 

'double',  end  'quedruple'.  Longer  keys  are  treat¬ 
ed  either  as  'virtual'  kaya  (cf.  section  i)  or  as 
lists.  Hash  tables  are  creatad  to  ba  one  of  the 
above  types,  'pair'  (i.a.  pair  of  a  single  length 
key  end  tha  associated  value)  or  'virtual'.  Tha 
type  information  la  put  in  tha  descriptor  (obta¬ 
ined  from  tha  descriptor)  by  Instruction  PTHT 
(CTHT) .  This  typs  information  is  used  to  invoke 
appropriate  micro  coda  at  tha  axcution  time  of 
HSR,  HCV  ate..  Note  that  for  'double'  keys,  tha 
hash  table  appears  aa  two-bank  (b- 2),  and  for 
'quadruple '  kaya,  aa  ona-bank  (b“l) . 

5.  » valuation  of  tha  Performance 

Figure  4  is  tha  timing  chart  of  HSR  operating 
on  1  single'  key.  Tha  actual  clock  periods  for  tQ, 

ty  and  tj  in  Fig.  4  art  approximately  300,  400 

end  1000  ns  respectively,  and  therfore  tha  esti¬ 
mated  execution  tins  (excluding  tha  fetch  and 
decode  tins)  of  HSR  in  tha  case  of  succaaaful 
search  is  1.6+1.3£  micro  aac,  where  i  is  tha  number 
of  hash  table  accaasaa.  <  depends  upon  the  losd 
fector  of  the  table  end  the  nuaber  of  memory 
bank* .  The  values  of  i  based  on  the  theoretical 
analysis  are  given  in  references."1 *  In  tha 
parallel  hashing  schemes,  i  is  equal  to  1  mostly, 
unless  the  haeh  table  le  heavily  loaded. 

Ttlbe  2  shows  the  timing  of  typical  runs 
which  nske  use  of  HSR.  We  can  observe  the 
performance  enhancenent  by  e  factor  of  ten  over 
the  software  haehlng.  Similar  improvement •  of  the 
performance  are  observed  in  the  case  of  tha  other 
hash  instructions. 


6.  Application  of  the  Hashing  Hardware 

Although  the  hashing  hardware  Is  designed  to 
be  general  as  far  as  possible,  In  this  paper  we 
only  give  following  applications.  This  Is  because 
these  are  used  In  existing  software  systaas  and 
the  effectiveness  of  use  of  hashing  is  already  es¬ 
tablished.  The  hardware  replacement  of  the  hash¬ 
ing  software  algorithm  will  greatly  speed  up  the 
operations  as  observed  In  section  5. 

(1)  symbol  table  manipulation  in  assemblers  and 
compilers, 

(2)  property  list  handling, & 

(3)  creation  of  a  unique  copy  of  data  structures 
to  enable  fast  equality  checking, 2>9 

(4)  as  a  special  case  of  (3),  hash  'cons'  In  Lisp 
for  the  sharing  of  sub-data  structures  and 
fast  equality  checking, 2 

(3)  set  operations. 9 

Symbol  table  manipulation 

Figure  5  Illustrates  data  structurea  of  the 
symbol  tables  to  be  used  In  conjunction  with  HU. 

Cn  Fig,  5,  HTl  Is  the  'pair'  type  hash  table. 

When  the  key  le  16  bit ,  the  key  Itself  is  put  In 
the  key  part  of  the  hash  table.  Longer  keys  ere 
accomodated  as  a  pointer  to  aoma  appropriate  entry 
of  another  hash  table  (e.g.  when  a  key  Is 
'double',  a  pointer  to  an  entry  of  HT2  Is  placed 
in  HTl.) 


Property  list  handling 

A  property  list  is  a  Llap  terminology.10 
An  Implementation  method  as  given  in  reference10 
relies  on  sequential  aearch  of  lists.  The 
method  discussed  here  Is  a  spaad-up  version  of 
property  list  handling  ualng  hashing.  For  exam¬ 
ple,  the  Lisp  code  (GET  OBJECT  ATTRIBUTE)  may  be 
executed  (Interpreted)  aa 

ItSK  tl,a  j  a  points  to  a  double-word  key 

;  consisting  of  pointers  to 
;  atoms  OBJECT  and  ATTRIBUTE, 

;  and  il  denote*  a  hash  table 
;  number. 

;  This  Instruction  searches  for 
;  a  Llap  cell  constructed  by 
i  hashed  cone(OBJECT,  ATTRIBUTE) 
BNE  UNSUC  ;  If  not  In  the  hash  table, 

;  unsuccessful  search 
;  (result  In  r) 

MOV  r,a 

HCV  t2,a  ;  t 2  la  the  'pair'  type 

;  hash  table,  where  the  value 
;  associated  with 
i  (OBJECT  ATTRIBUTE)  is  stored. 


Creation  of  unique  copy  of  complex  structures 


bo  formatted  so  that  UU  can  handle  It.  One  way  to 
handle  the  complex  structure  Is  to  make  an  abbre¬ 
viated  kay  (p.343  in  Knuth6)  or 
vfvirtuaD-key1!  out  of  It.  How  to  make  the 
v-key  la  In  the  realm  of  software.  To  treat  a 
v-key  as  a  proper  hash  key  Is  that  of  herdwars. 

In  treating  a  v-key,  we  should  note  that: 

.  creation  of  a  v-kev  out  of  a  complex  structure 
fs  many-to-one  mapping, 

•  hance,  HU  has  to  cope  with  the  situation  of 
multiple  key  matches. 

The  search  algorithm  In  a  v-key  differs  from 
Algorithm  S  in  the  following  points : 

1.  When  a  v-key  match  occurs,  it  savaa 

the  current  hash  status  (o.  d,  m,  h,  Lh) , 

and  returns  ths  pointer  to 

r-key*1  (performed  by  instruction  HCR) . 

2.  The  associated  software  checks  whether  r-keys 
match. 

3.  If  r-key  match  occurs,  the  search  ends 
successfully . 

Otherwiss,  the  search  restarts  Trom  the  next 
point  where  it  is  suspended  rft.  tr  restoring 
the  hash  status  (performed  by  instruction 
HGRN) 

4.  When  £*2  becomes  true,  the  search  terminates 
unsuccessfully. 

Aa  a  spaclal  case,  we  consider  the  case  that 
the  key  itself  Is  again  a  pointar  to  a  haah  table. 
This  la  the  esse  where  a  eat  is  implamantad. 

Figure  6  shows  the  data  structure.  The  search 
algorithm  is  ss  follows: 

1.  Compute  the  v-key  using  a  symatrlc  hash 
function,  g 

l.e.  g(x,y)-g(y,x) ,  since  the  order  of 
elements  of  a  set  Is  insignificant. 

Use  HCR  and  find  the  v-key  match, 

3.  If  E  then  terminate  the  algorithm 
(unsuccessful  search) . 

4.  Use  HSR  to  test  the  matches  of  each  element  of 
the  hash  tables . 

5.  If  *11  the  elements  match,  terminate  the 
algorithm,  otherwise  find  the  v-key  match  by 
HGRN  and  goto  3. 


*1  When  necessary,  we  use  term  ’r(r*al)-kty'  to 
denote  the  key  other  than  v-keyt  to  clarity  the 
difference. 

*2  Strictly  speaking,  E  is  not  tha  aarne  as  that 
defined  In  section  4,  since  the  seen  of  signals 
»«y  start  from  ths  bank  dlffarent  from 
1,  and  since  multiple  match  may  occur. 


In  general,  complex  structures  cannot  be  t re¬ 
nted  directly  by  HU,  unless  It  Is  built  up  of  uni¬ 
form  structures  such  as  lists  in  Lisp.  It  should 


We  should  note  that  since  HU  la  iu*d  recur- 
ilviljr,  w  Mad  to  mm  the  coatnti  of  tho  tempo ~ 
rary  storage  la  V  (l.e.  Ahi  h^) ,  besides  afeatua 

a^  aal  a^  U  the  v-key  ytooaaalaa  (eMcutlca  of 

HCft  and  MW) .  haaea,  we  have  duplicate  of  regla- 
tara  la  M  actually |  aaaa  for  r-kay  haahlag  aad 
tha  others  for  a  toy  haahlag. 

la  tka  eaaa  of  llata,  we  aw  do  without 
r-kayo.  *13  la  fl|.  )  lllaatcataa  tka  akml 
linked  liot  oeue true tod  by  unique  'coo a'  by 
hasklag. 

7.  Mshtt>atMEiLiaBa£togL*^ 

Alternative  Techulouss 

To  aaaaarlaa  tka  sppllcatioM  discussed,  wa 
aoa  that  kaaklaj  la  need  aaaaatlally  la  throa 
waya ;  (1)  aoaoclatlva  retrieval  mi  (2)  ooaottac- 
tloa  of  a  unique  copy  of  a  data  attuctaro  for  faat 
equality  shacking,  aad  (3)  aa  a  coasaqaanca  of  (2), 
sharing  tka  aak  data  structures  la  constructing 
complex  atmcturaa. 

Associative  retrieval  by  haaklag  la  baood 
upoa  tha  slgla-kit  property  of  kayo.  Thia  opora- 
tloa  could  ba  parforaad  by  associative  aaoorlaa 
auch  aa  surveyed  by  Tau  aad  Fung.ls  however , 
wa  eoaaldar  haahlag  aore  advantageous  la  our  prob- 
laa  doaala  for  tha  fallowing  raaaooai 

•  Haaklag  la  baood  upoa  eouwoatloaal  lAMa 
(taodau  Aaoaaa  Haaory  chipa) , 

which  ora  alaplar  In  atrueturoa  at  gata  loyal 
by  at  least  cao  order  of  sagaltuda 
than  aeaeuiatlve  aaaory  chipa  a.g.  total  310A, 
aot  to  aaatlaa  tha  coat  parforaanea. 

•  Furtbaraara  with  tha  aa m  laval  of 
aaalcoudMtor  tackaology  EAMa  aro  faatar 
than  aaaoclatlvo  aaaorlaa; 

haaea  la  oany  appllcatlooa  haahlng  la  faatar 
thaa  aaaoclatlvo  procaaalng  baood  on 
aaaoclatlvo  aaaorlaa. 

•  larger  acalo  laplaaantatloo  la  pooalbla  with 
our  haahlag  achaaa ;  tha  alaa  of  tha  table  la 
Halted  oaly  by  address  apaca  of  tha  aala 
aaaory. 

»  Full  capability  of  hoat  CFU  can  ba  utlllaod 
la  eoajuBctlaa  with  ha  ah  table  aaslpulatloa 
with  no  additional  hardware  coot,  a loco 
hash  lab lea  are  reallaed  la  aaln  aaaory. 

•  Haaea  tha  capability  of  aaaoclatlvo  retrieval 
la  oaolly  lacorporatad  laco  existing 
architecture  as  ah own  la  previous  sections. 

•  Variety  of  data  atmcturaa  caa  ba  uaad  in 
hashing,  since  they  are  roalisad  in 

UMa  (Mia  aaaory) 

whereas  la  aaaoclatlvo  Maorloa  data 
atmcturaa  would  bo  subjected  to  hardware 
aaaory  word  configuration. 

As  for  tho  second  aad  third  usage  of  haahlag, 
carrospondlng  officiant  algorlthaa  (o.g.  aot 
operations)  baaed  on  aaaoclatlvo  Maorloa 


would  bo  difficult  to  develop.  Different 
approachea  to  theoe  applications  would  ba 
Mcoaaary. 

>.  Coacludlka  kawWa 

Ha  haw  ahaaa  how  haahlag  ass  ba  tapOaaantad 
by  hardware  and  gives  mm  llhM twelve  mhU* 
of  lea  use. 

•  i  \ 

Tha  arckltacturo  dwa  la  fig.  2  reflects 
tha  bMle  ragulroaaata  for  tka  haahlag  hardware  aa 
glow  la  [7).  It  also  reflects  aha  daalga  oo»* 
prnslaa  lapses d  by  praetlael  asasidaretloaa  for 
tha  argtrtsMtll  syeCMi  a aak  m  oast  parfonaira, 
coapatlhlllty  with  tha  existing  aystns,  llaanaloaa 
of  tho  ayotoa,  etc . .  Ho  brief  Ip  disease  tho 
nltaraatlVM  wa  oould  have  tahaa  if  bom  at  tha 
show  llsltatlaae  warn  roaovad. 

Let  us  taka  tka  naadetlM  of  Mt  oporatlsg  aa  > 
a  '  single '  key  (without  ha r  dnlatlM),  fas  sum  la; 
Tka  average  asacetlou  tint  la  divided  late  Ufi 
122  and  472  vhen  tha  law  facta*  la  0.9,  far 
asaary  nccaaaaa,  key  iHgaaaar,  aad  ether  micro 
operation,  raapectlvaly.  Thia  Mtln  Indicates 
that  tha  hashing  operas Inaa  aw  aMOry  Halted. 

If  faster  aaaorlaa  or  aaahaa  aw  available,  the 
apaad  af  haahlag  will  ha  farther  lap  roved. 

Tha  geaeratloa  algorltha  of  hash  addraaa 
aaquaacas  given  la  IM  suffers  froa  aaa  ualforalty 
of  ho  and  tk,  whM  tha  alas  of  the  tahla,  P  la 
not  close  to  tho  power  af  2.  The  trade -off 
between  speed  aad  the  waifotalty  of  tha  dlstrlbu- 
tloa  of  m .Initial  hash  oddrseosa  la  dlacuesed 
elsewhere.12 

Ixoodnlng  tha  figures  in  table  2,  we  can,  con¬ 
clude  that  tha  choice  of  Ml  (cm  IAS)  and  b“A 
(four  Mao  nr  banka)  mom  to  ho  adequate  (aeo 
reference,'  for  furthor  discussions),  however, 
with  additional  hardware ,  we  would  have  chosen 
paraMtnrn  h-h  (aligh  Increase  of  tha  parforaanea 
will  rtnult) ,  or  W  with  nnch  aaaory  cord 
equipped  with  6A  11.  Thao  all  the  aaaorlaa  could 
bv  usable  for  haahlag. 

leftwara  which  ashes  extensive  use  of  the 
hashing  hnrdwnrn  la  aot  yet  coaplotod.  Full  eva¬ 
luation  of  tha  hardware  ban  to  await  for  the 
software  doMlopant.  Dm  experience  with  the  de¬ 
sign  and  construction  of  tha  hashing  hardware  will 
ba  used  to  bull  a  larger  system  far  ayahalic 
algebra.13  Ha  hope  that  tha  lMtructioa  reper¬ 
toire  will  provide  data  to  • tends rdf aad  tha  hashing 
operations  both  la  hardware  aad  software.  Ha  also 
hops  that  la  high  laval  laapnaga  Mchfnaa  hashing 
hardware  will  he  Incorporated  aa  m  integrated 
unit  since  hashing  Is  believed  to  apaad  up 
essential  search  operations  la  laterpretar-baaed 
eyetsM  such  aa  Lisp  and  a  direst  execution 
machine  for  high  level  laaguagM.3* 
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Instruction 

Function 

HSR 

Ssnrch  key 

KGV 

Get  value  of  'pair* 

HPV 

Put  value  in  'pair* 

HMI 

New  key  insert 

HSI 

Search  end  insert 

USD 

Search  and  delete 

HCR 

Get  real-key 

HGBJN 

Get  reel-key  next 

HPR 

Put  renl-key 

HDX 

Delete  existing  virtual-key 

HRTI 

Return  from  hash  Interrupt 

PTHT 

Put  In  hash  table  descriptor 

CTHT 

Cat  from  hash  Table  descriptor 

Table  1  List  of  Hashing  Instructions 


case  1H 

case  2H 

case  IS 

cast  2S 

HSR  for 

'slngle'keye 

6.1 

6.6 

5.5x10 

8.3x10 

HSR  for 

'double 'keys 

1.1x10 

1.2x10 

1.2xl02 

1 . 7xl02 

HSR  for 

'quadruple* 

keys 

1.8x10 

2.0x10 

2.0xl02 

2.3xl02 

(In  micro  sec) 


Note : 

1.  Values  are  average  execution  timings 
when  acceteing  ell  the  keys  that  era 

1H :  filled  upto  50Z  of  the  table  that  le 
initially  'empty* 

2H:  filled  upto  BOX  of  the  table  that  is 
initially  'empty'. 

Cases  IS  end  2S  are  those  obtained  by  executing 
equivalent  pure  software  (uaing  standard  PDP11 
instructions)  hashing  algorithms  on  the  sene 
machine. 

2.  Timings  Include  fetch  and  decode  time  and 
Interrupt  handling  time  if  Interrupt  occurs. 


Tablo  2  Avurugu  Execution  Timings  of  USE 
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Figure  2  System  with  Hashing  Hardware 
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Figure  3  Block  Diagram  of  Hash  Addressing  Unit 
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Figure  5  Representations  of  a  Symbol  Table 
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Abstract 

Recently  Introduced  database  Machine  proposals 
are  critically  reviewed.  A  new  architecture  for 
the  cell  processor  of  the  RAF  database  machine  util¬ 
izing  multiple  microprocessors  and  LSI  serial 
memories  Is  presented,  The  proposed  cell  processor 
designed  down  to  the  logic  gate  level,  embodies 
concepts  of  modularity,  flexibility,  and  firmware 
driven  query  processing.  The  concept  of  firmware 
execution  of  high  level  RAP  assembler  Instructions 
is  presented.  The  results  of  various  analyses  of 
the  analytical  and  simulation  models  of  the  new 
architecture  which  were  carried  out  elsewhere  are 
summarized.  Special  emphasis  is  given  to  bulk 
memories  that  have  the  start-stop  controllability 
(like  magnetic  bubble  memories  or  RAM  arrays 
simulating  serial  access)  together  with  the 
Increases  in  functional  capability  and  performance 
obtained  by  Incorporating  such  memories. 

KEYWORDS:  DATABASE  MACHINES, ASSOCIATIVE  PROCESSORS, 
DATABASE  MANAGEMENT,  LSI  MEMORIES, 
MICROPROCESSORS,  COMPUTER  ARCHITECTURE 


Introduction 

The  idea  of  providing  backend  computers  for 
the  efficient  management  of  large  databases,  as  a 
substitute  for  the  slow  software  access  methods, 
has  received  considerable  attention  in  the  recent 
years,  The  research  efforts  spent  on  this  area 
have  got  the  deserved  recognition  with  the  two 
special  issues  of  IEEE  journals' 

In  the  last  years,  many  specialized  processors 
for  handling  the  database  management  operations  3 
have  been  proposed.  Among  these  there  are  CASSMJ 
to  process  hierarchies  and  tables,  RARES*  for 
relational  database  management  and  RAP®»6  that  has 
been  Implemented  at  the  Un1versity,of  Toronto  and 
has  also  undergone  certain  changes.  DIRECT®'9  is 
being  Implemented  at  the  University  of  Wisconsin. 
Other  proposals  Include  the  Database  Computer 
(DBC)’ “ , 1 '  as  a  backend  processor-memory  complex 
and  the  Bubble  Memory  Relational  System12. 

In  this  paper  we  will  first  survey  the  most 
recent  research  efforts  in  the  database  machine 
field  and  then  present  a  new  approach  to  the  RAP 
processor  architecture,  beyond  that  of  RAP.27  , 
utilizing  LSI  technology,  like  off- the  shelf 


microprocessors ,  magnetic  bubble  memories  (MBM) , 
high  density  bulk  RAM  chips,  etc. 

Survey  of  Recent  DBM  Proposals 

Most  of  the  recent  database  machine  proposals 
have  exploited  the  advances  in  technology  by 
incorporating  microprocessors,  CCD's,  MBM's  and 
the  like. 

DIRECT  is  a  system  for  supporting  relational 
databases.  The  system  comprises  a  host  for 
interfacing  with  the  users,  a  backend  controller 
for  coordinating  the  overall  database  machine 
hardware  and  software,  mass  storage  units  for 
storing  the  database, a  set  of  query  processors, 
and  CCD  page  frames  for  holding  the  relation  pages 
that  are  being  processed. 

In  this  system,  the  query  processors  and  CCD 
paqe  frames  are  connected  to  each  other  by  util¬ 
izing  a  cross-bar  switch,  so  that  all  processors 
can  access  all  page  frames.  Although  this  cross 
bar  switch  is  much  simpler  than  the  conventional 
cross-bar  switches,  it  may  not  be  cost  effective 
and  may  also  reduce  performance  in  larger  g 
implementations  of  this  system  as  proposed  in 
with  103  processors.  This  is  because,  as  the 
number  of  processors  and  page  frames  increases, 
the  selector/decoder  networks  at  the  processor 
interfaces  and  the  qating  networks  at  the  page 
frame  interfaces  of  the  cross-bar  switch  grow  in 
size, thereby  introducing  extra  delays  in  the  data 
transfers  between  the  processors  and  the  page 
frames,  and  hence  decreasing  performance  consider¬ 
ably. 

Another  feature  of  the  DIRECT  system  is  that  the 
results  of  the  basic  relational  algebra  operations 
executed  by  the  query  processors  are  treated  as 
temporary  relations  and  are  written  onto  free 
page  frames  allocated  by  the  controller.  The 
number  of  temporary  relation  page  frames  depends 
on  the  number  of  query  processors  assigned  to  the 
query . 

This  scheme  increases  the  query  processor- 
controller  interaction  durinq  page  frame  processing 
because  of  temporary  page  frame  requests  and  may 
Introduce  unnecessary  page  faults  for  some  other 
set  of  query  processors  executing  another  query 
concurrently,  just  because  their  page  frames  may 
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be  assigned  to  the  temporary  relations  of  a 
higher  priority  Query.  In  this  way,  the  deoree  of 
parallelism  My  drop  seriously  because  of  the 
creation  of  temporary  relations.  The  temporary 
relations  m«y  cause  a  more  serious  performance 
degradation  during  the  join  operations  In  which 
the  system  page  frame  resources  have  to  be 
partitioned  for  the  source  and  result  relations. 

The  join  operation  may  produce  result  relations 
with  sites  comparable  to  the  source  relation  and 
It  Is  very  likely  that  this  system  will  suffer  the 
thrashing  problem  In  the  Join  operation. 

The  Database  Computer  (DBC)  Is  a  system 
proposed  for  very  large  databases  and  a  variety 
of  data  models,  utilizing  modified  conventional 
moving  head  disks.  The  basic  system  comprises  two 
processing  loops;  the  structure  loop  for  pipelined 
process Ino  of  the  keywords  and  record  Indices  and 
the  data  loop  for  actually  processing  the  database 
contents . 

One  of  the  Mjor  drawbacks  of  this  system  Is 
Its  way  of  representing  data  as  attribute-value 
pairs.  This  scheme  of  repeating  the  attribute 
Information  wastes  a  considerable  amount  of  data 
space.  Another  drawback  Is  that  the  number  of 
processors  for  doing  the  actual  processing  Is  very 
small  compared  with  the  database  size;  thereby 
reducing  the  parallelism  that  should  be  inherent 
In  database  machine  systems.  Furthermore,  the 
number  of  Interconnections  required  between  the 
disk  drive  array  and  the  track  Information 
processors  may  be  prohibitive  In  terms  of  cost  and 
physical  requirements  for  the  configuration 
proposed . 

The  DBC  relies  on  the  concept  of  partitioned 
content  addressable  memory  (PCAM)  for  data  accesses. 
A  PCAM  Is  one  cylinder  of  a  disk  volume  and  Is  the 
largest  amount  of  memory  that  can  be  processed 
with  the  limited  amount  of  processors.  One  PCAM 
can  be  processed  In  one  disk  revolution,  but  if  the 
qualification  for  a  retrieval  Is  complex  and/or  If 
the  data  to  be  processed  occupies  a  large  number 
of  cylinders,  then  many  disk  revolutions  are 
necessary  for  processing  the  data.  The  relational 
operation  of  join  Is  also  executed  in  a  very 
inefficient  manner.  First,  all  the  qualified 
domain  values  of  the  source  relation  are  retrieved 
and  then  for  each  source  value,  another  retrieval 
instruction  over  the  target  relation  is  Issued. 

This  Implies  that  the  nunber  of  Instructions 
executed  by  the  track  information  processors 
depends  directly  on  the  number  of  source  domain 
values. 

The  performance  study  of  tbjs  system  In 
supporting  relational  databases1'  shows  that  a 
general  purpose  conventional  computer  performs 
better  than  06C  for  large  relations  (e.g.  with 
20000  tuplas)  with  reasonably  large  tuple  sizes. 
This  in  turn  Implies  that  this  system,  although 
designed  to  support  large  data  bases  efficiently, 
cannot  support  a  database  with  large  relations  as 
efficiently  as  a  conventional  computer  despite 
the  additional  hardware  costs  introduced. 


Furthermore,  since  this  system  relies  also  on 
the  concept  of  Index  orocesslng  (although  In  hard¬ 
ware),  the  similar  problems  Incurred  by  the  update 
operations  on  conventional  systems  Is  likely  to 
occur  In  DBC,  because  the  structure  memory  should 
be  updated  as  to  reflect  the  result  of  the  update. 

Utilization  of  MBM's  for  supporting  relational 
databases  has  been  recently  proposed  by  ChanQ12. 

The  propoted  hardware  cMfrlses  MM  chips  with 
certain  augmentations  to  facilitate  associativa 
selections.  A  relation  Is  mapped  on  one  or  more 
MBM  chips  with  tuples  across  the  minor  loops  and 
the  domains  along  tht  minor  loops.  It  Is  claimed 
by  the  author  that  augmentation  of  the  MBM  chips 
with  off-chip  Indexing  loops  provides  convenient 
indexing  during  data  qualification  and  avoids 
redundant  traversing  of  disqualified  deta.  TWo 
off-chip  registers  and  a  one  bit  comparator  are 
provided  for  the  database  operations.  The 
instruction  set  of  this  system  Is  said  to  be 
inspired  from  that  of  RAP  with  minor  changes. 

The  operational  deficiencies  of  this  system 
result  from  mainly  the  following:  Since  the  hard¬ 
ware  employed  1$  substantially  small  and  simple, 
provisions  for  In-place  updates  have  not  been 
provided.  Furthermore,  tne  existence  of  only  one 
comparator  limits  parallel  comparisons  on  data, 
hence  limits  query  complexity.  Also,  the  Join 
operation  Is  handled  Implicitly  as  In  RAP,  but  only 
a  single  domain  value  from  a  source  relation  Is 
transmitted  to  the  target  relation  per  scan.  This 
mode  of  operation  may  severely  degrade  the  perform¬ 
ance  of  such  a  system  in  a  join  operation. 

The  following  sections  describe  a  restructur¬ 
ing  of  the  RAP  cell  processor  utilizing  off-the- 
shelf  microprocessors  and  bulk  serial  memories, 
especially  MBM's.  The  proposed  system  differs 
considerably  from  the  previous  designs  of  RAP. 

First,  the  hardware  structure  of  the  cell  Is 
configured  Into  a  more  regular  and  modular  structure 
and  the  hardware  complexity  in  terms  of  chip  count 
has  been  reduced  to  a  third  of  the  previous  designs. 
Secondly,  query  processing  driven  by  microprocessor 
firmware  and  utilization  of  start/stop  controllable 
memories  such  as  MBM  and/or  high  density  RAM's 
permit  highly  complex  data  qualifications  and 
highly  efficient  join  operation.  The  proposed 
system  can  be  considered  as  a  RAP. 3  system 

described  In'.  The  reader,  after  following  the 
paper,  can  draw  a  comparison  of  other  database 
machines  with  the  enhanced  features  of  RAP,  as 
also  sunmarlzed  In  the  conclusion,  Including 
especially  the  join  operation. 

The  RAP  database  machine  can  also  be  regarded 
as  a  good  example  of  a  High  Level  Language  Computer 
Architecture.  Since  the  context  of  the  present 
discussion  will  deal  with  the  architectural  aspects 
of  the  new  version  of  thq  RAF  cell  structure  and 
the  fact  that  the  basic  RAP  architecture  along  with 
its  Instruction  set  are  covered  elsewhere5  »6*7,  we 
will  be  content  with  providing  only  a  summary 
description  of  the  latest  RAP  instruction  set  in 
Appendix-4. 
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The  CM  of  each  cell  also  hat  *  different 
structm.then  Its  counterpart  I  n  <M  previous 
designs?*7.  The,, CM  is  chosen  as  word  $brial 
orginiaptlpTi  meirflv  to  fit  data  access  port  size 
to  the  tubcell  microprocessor  date busv  width  and 
«.lso  to1  incorporate  rather  slow,  hbwly  emerging 
bulk  ineppry,  techool ogles,  .like  or  high,  density 
SWTs  (e.g.  S4  K)  In  p  peral  lei  Organisation  so  as 
to  enhahce  the  effective  date  rite.  A  RAP  relation 
Is  Moped  tfiliictly  onto  the  CK,,jio  that  the  logical 
end  physical  structures  of  data  Are  .exactly  the 
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domains  is  thcreafed  to  16. 

In  idpteV  numeric  domains  can  be  2  or  4  bytes 
with  2  *  Si '  complement  representation  and  non-numeric 
domains' can  ■pp  as  long  as  required,  provided  that 
the  sum  of  the  domain  lengths  Is  loss  than  or  equal 
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Figure-1  Structure  of  the  new  RAP  cell. 
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to  the  maximum  tuple  size  of  1024  bytes.  Further¬ 
more,  other  data  types  like  floating  point  numbers 
can  be  easily  supported  without  any  extra  hardware. 
Figure-2  shows  the  format  of  the  cell  CM. 

Operation  of  the  Cell 

The  linear  array  of  subcells  provides  multiple 
buffers  (as  small  RAM's)  for  the  tuples  coming  fiom 
the  CM.  At  any  time  during  CM  circulation,  more 
than  one  tuple  can  be  out  of  the  CM,  which  may  be 
in  the  states  of  being  loaded  into  a  subcell  buffer, 
being  stored  into  CM  from  a  subcell  buffer,  or 
being  processed  in  a  subcell.  The  existence  of 
multiple  buffers  provides  the  necessary  time  for 
processing  the  tuples,  thereby  synchronizing  the 
data  move  and  data  processing  rates.  The  sequence  of 
operations  during  a  circulation  of  CM  can  be 
described  with  a  process/time-slot  diagram  given 
in  F 1gure-3. 

In  Figure  3,  Lj,  p.  and  denote  the  load, 

process,  and  store  states  of  some  tuple  for  subcell,, 
respectively.  When  the  CM  circulation  Is  initiated 
successive  tuples  are  loaded,  via  DMA,  into 
successive  subcells  starting  with  subcell,,  until 
the  end  of  (k-l)th  tuple.  In  order  to  stay  In 
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synchronization,  the  first  tuple  should  be  stored 
from  subcell,,  while  the  k  th  tuple  is  being 
loaded  4nto  subcell.,  and  the  2nd  tuple  should  be 
stored  from  subcell,  while  the  (k  +  l)th  tuple  is 
being  loaded  into  subcell,,  etc.  During  the 
circulation,  each  subcell  microprocessor  is 
initiated  for  processing  as  soon  as  its  buffer  is 
loaded  with  a  new  tuple. 

It  is  evident  that  during  the  processing  of  CM 
contents,  only  (k-2)  of  k  subcells  are  actually 
active  at  a  given  time.  This  may  bring  the  idea 
of  multiplexing  (k-2)  processors  among  k  tuple 
buffers  or,  in  general,  multiplexing  P  processors 
among  M  tuple  buffers  where  M>P.  If  M  is  not  an 
integral  multiple  of  P,  then  a  general  interconnec¬ 
tion  network  (e.g.  a  cross-bar)  should  be  utilized 
to  allocate  processors  to  buffers.  If  however  M 
is  an  integral  multiple  cf  P,  then  a  simple 
but  static  interconnection  schemi  for  multiplexing 
each  processor  among  (M/P)  huffors  way  suffice. 
However  in  both  cases,  besides  the  interconnection 
complexity  introduced,  the  i^poftant  feature  of 
CM  wait  time  utilization  (to  bt  described  later) 
cannot  oe  mad*  possible. 

After  pointing  out  this  alternative  to  the 
original  k-parallel  microprocalsor  approach,  the 
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Figure  -  3  i  Load/rrocaaa/ttora  aagueacaa  of  cell  operation 
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paper  will  continue  dealing  with  the  dedicated  k 
parallel  microprocessor  approach  to  elaborate  on 
t.iie  wait  schemes  and  to  preserve  the  modularity  of 
the  cell  architecture. 

As  it  was  pointed  out  in  the  processing 

time  allocated  for  a  subcell  after  its  tuple  is 
loaded  is 

Tor  =  <*-2)*Tls 


where  k  is  the  number  of  subcells  in  a  cell  and  1 , 
is  the  DMA  load/store  time  for  a  tuple.  It  shoulcf 
ne  noted  that  the  allocated  time  depends  on  the 
tuple  size  and  is  larger  for  longer  tuple  sizes. 

In  any  case,  the  worst  case  expected  processing 
time  should  be  less  than  or  equal  to  TpR  for  a 
given  tuple  size  so  that  synchronization  is  not 
lost.  This  constraint  puts  very  high  demands  on 
the  subcell  microprocessor  performance  and  on  the 
number  of  subcells  k  (increasing  k  increases  the 
allocated  time)  if  the  CM  cannot  be  controlled  in 
•i  start/stop  fashion  (as  would  be  the  case  with 
■  uta ting  devices  or  CCD  memories).  Furthermore, 
this  constraint  limits  the  functional  capability  of 
tne  subcell  by  restricting  the  complexity  of  query 
qualification  expressions. 

Ihe  proper  use  of  the  start/stop  feature  of 
MBM's  (nr  asynchronous  access  feature  of  bulk  RAM's) 
relieves  the  above  constraints,  so  that  hardware 
parameters  can  stay  within  feasible  limits.  This 
is  allowed  in  such  a  way  that  no  performance 
degradation  for  average  processing  times  occurs, 
while  longer  processing  times  corresponding  to  more 
complex  qualification  expressions  impose  a  certain 
dynamic  performance  degradation  which  can  be  traded 
off  with  the  issue  of  minimizing  hardware. 
Furthermore,  it  Is  observed  that  in  the  execution 
of  the-relational  join  operation,  handled  implicitly 
in  RAP3  where  a  target  relation  (domain)  value  is 
matched  disjunctively  against  an  array  of  source 
relation  (domain)  values,  the  deliberate  imposition 
>  f  •  i:'s  on  CM  (by  stopping  CM  whenever  necessjiy, 
i educes  the  overall  time  to  execute  the  join 
operation.  This  point  will  be  detailed  in  a 
fa  Hawing  section. 

T  igure-4  shows  the  process/time-slot  distribu¬ 
tion  f  a*  a  controllable  CM  and  for  k  =  4,  The 
basic  idea  behind  the  utility  of  the  start/stop 
feature  of  controllable  memories  can  be  stated  in 
1  lie  following  way:  when  the  time  comes  to  store  a 
tuple  from  a  subcell  buffer  (e.g.  storing  subcell, 
while  loading  subcelk)  if  that  subcell  nas  not 
yet  asserted  that  the  processing  of  the  tuple  is 
complete,  the  CM  is  put  temporarily  in  a  wait 
state  to  allow  for  the  completion  of  processing, 
the  extra  time  requested  by  a  subcell  becomes  also 
available  to  (k-2)  succeeding  subcells  so  that  the 
chance  that  they  will  impose  further  waits  is 
highly  reduced.  A»  analysis  of  the  timing  of 
operations  fur  is  case  is  presented  in  Appendix  1. 

Functions  of  the  Basic  Hardware  Modules 


The  hardware  modules  given  in  Figure-1  have 
the  following  functions: 
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SUBCELLs:  They  process  the  tuples  loaded  into 
llieir  buffers  by  the  DMA  CONTROLLER.  The 
processing  is  driven  by  a  query  routine  loaded 
into  SUBCELL  memories  prior  to  the  initiation  of  a 
RAP  instruction. 

DMA  CONTROLLER:  This  module  controls  the 
simultaneous  bidirectional  data  transfers  between 
the  cell  memory  and  subcell  buffers  during  the 
load/store  operations.  It  also  sequences  the  load/ 
nrocess/store  operations  and  keeps  track  of  the 
cell  CM  status . 

BUSES:  There  are  four  buses  that  provide  data, 
address  and  control  paths  between  the  cell  modules 
during  data  transfers. 

CELL  INTERTAC.f:  This  module  coordinates  the 
overall  cell  operation  during  instruction 
initiation  and  termination,  keeps  track  of  cell 
status,  and  provides  for  the  communication  of  the 
cell  with  the  RAP  array  controller. 

Query  Execution 

in  the  new  architecture,  the  microprocessors 
of  the  subcells  in  each  cell  are  the  basic  data 
processing  units.  Therefore,  these  microprocessors 
can  be  programmed  to  execute  RAP  instructions''’” 


The  basic  idea  behind  the  emulation  of  RAP 
instructions  with  microprocessor  routines  is  that 
each  RAP  instruction  can  be  mapped  into  what  is 
called  a  "query  routine".  The  basic  RAP  instruc¬ 
tion  constructs  (i.e.  MARK,  RESET,  MKED,  UNMKED, 
updates,  set-function  computations,  comparisons 
etc,  )  have  simple  microprocessor  code  equivalents. 
Furthermore,  the  combination  of  the  results  of 
various  qualification  tests  as  disjunctions  or 
conjunctions  (or  mixed  which  was  not  available  in 
the  previous  designs)  can  be  embedded  into  the 
sequential  logic  of  the  microprocessor  query 
routine.  This  mapping  brings  considerable 
enhancements  to  RAP  capabilities,  since  now, 
qualification  complexities  are  limited  only  by  the 
subcell  microprocessor  program  memory  size  instead 
of  the  static  hardware  registers  of  the  previous 
designs  .  Furthermore,  since  the  whole  tuple  can 
be  accessed  during  processing,  domain  to  domain 
comparisons  and  updates  are  also  made  possible. 

An  example  of  a  query  routine  is  provided  in 
Appendix  3. 

The  subcell  microprocessor  memory  comprises  two 
parts.  The  ROM  part  contains  the  basic  qualifica¬ 
tion  evaluation  routines  (i.e.  numeric  and  non¬ 
numeric  value  comparisons)  and  routines  for  the 
relational  join  and  free  variable  operations.  The 
RAM  part  is  logically  partitioned  into  two  parts: 
one  for  the  query  routines  and  communication 
buffers,  and  the  other  for  the  tuple  to  be  processed. 


Before  the  initiation  of  a  RAP  instruction, 
the  equivalent  query  routine  and/or  necessary 
parameters  are  loaded  into  the  RAM's  of  the 
subcells  of  all  the  cells  involved  in  the  instruc¬ 
tion,  after  the  cell  interfaces  connect  their  cell 
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buses  to  the  buses  of  the  RAP  controller. 

Each  tlm  a  CM  circulation  is  started  and 
whenever  a  new  tuple  is  loaded  into  a  subcell 
buffer,  the  microprocessor  is  forced  out  of  the 
idle  state  to  branch  to  the  query  routine.  At  the 
end  of  processing,  a  hardware  flag  is  asserted  to 
signal  the  DMA  CONTROLLER  so  that  the  tuple  can  be 
stored  back. 

The  cell  Interface  is  also  controlled  by  a 
microprocessor,  which  after  each  RAP  instruction 
is  executed  on  the  CM  contents,  polls  each  subcell 
and  updates  the  cell  status  and  computes  (if 
applicable)  cell  set  function  subresults. 

Execution  of  the  Implicit  Join  Operation 

The  Important  and  frequently  encountered 
database  operation  of  join,  is  done  implicitly  in 
RAP*,J1  by  the  cross-mark  type  commands.  This 
operation  Is  accomplished  by  extracting  the  qualified 
source  domain  values  from  the  source  relation 
cells  and  transmitting  them  to  the  target  relation 
cells  until  all  source  (master)  relation  cells  are 
processed.  The  execution  of  this  operation  had  to 
be  made  as  efficient  as  possible,  because  it  was 
practically  the  only  case  where  the  superiority  of 
the  RAP  system  to  conventional  systems  was  estimated 
as  to  be  less  than  10-fold13. 

The  new  architecture  employs  a  similar  scheme 
for  this  operation.  The  values  from  qualified 
tuples  of  the  first  source  relation  cell  are  read 
out.  and  buffered  at  the  RAP  controller,  then  a 
block  of  source  values  are  loaded  into  target 
relation  cell  subcells  and  these  cells  are 
Initiated  for  processing,  This  block  loading  is 
repeated  until  all  of  the  buffered  source  values 
are  processed;  then  the  next  source  relation  cell 


values  are  buffered  and  the  above  operations  are 
repeated  until  all  source  relation  cells  are 
processed. 

The  number  of  source  values  loaded  into 
target  relation  cells  per  circulation  depends  on 
the  size  of  RAM  space  of  the  subcell,  and  In  the 
current  design,  400  2-byte  numeric  domain  values 
(equivalently  200  4-byte  numeric  and  a  total  of 
800  bytes  of  non-numeric  domain  values)  can  be 
loaded  and  matched  against  a  single  target  value. 
This  number  compared  with  3  to  5  of  previous  RAP 
designs  shows  a  significant  Improvement  In  the 
execution  of  the  join  operation,  (the  Improvement 
however  Is  not  as  much  as  the  ratio  of  the  loading 
factors  due  to  the  differences  In  the  architectures 
and  the  fact  that  the  cross-mark  operation  Is  now 
broken  into  discrete  steps  each  starting  at  a  new 
revolution  (l.e.  a  repeated  MARK  Instruction)). 

A  snapshot  of  cross-mark  execution  Is  provided  In 
F 1gure-5. 

It  Is  evident  that  processing  that  many  source 
values  Imposes  waits  on  the  CM  and  hence  Increases 
the  overall  circulation  time.  However,  It  Is 
observed  that  (in  Appendix-2),  If  n  Is  the  number 
of  source  values  that  can  be  processed  without 
imposing  any  waits,  loading  mxn  (m  >1)  source 
values  per  circulation  will  reduce  the  number  of 
circulations  by  (1/m)  while  the  Increase  In  each 
circulation  time  of  the  target  relation  cells  will 
be  significantly  less  than  m-fold,  because  of  the 
parallelism  In  the  cell.  In  this  way,  the  overall 
time  to  process  a  source  cell  with  mxn  values 
loaded  per  circulation  wll 1  be  less  than  the 
overall  time  with  n  values  loaded  per  circulation, 

Features  of  the  New  Design 

The  new  RAP  cell  processor  based  upon  the 


Source  relation  celts 


Target  relation  cells 


The  join  domain  va  lues 
are  read  out  from  each 
source  cell  and  buffered 
at  the  control  ler 


The  entire  target  relation 
is  scanned  completely  in 
one  memory  circulation  time 


Figure-5  Execution  of  the  cross-mark  instruction 
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concepts  presented  #bovo  has  been  designed  down  to  to  reflect  the  average  case.  The  second  distrl- 

the  gate  level,  together  with  the  necessary  micro-  butlon  had  a  mean  of  1000  y  sec  with  125  y  secs 

processor  query  routines  for  the  general  RAP  and  2000  y  secs  as  the  bounds  to  model  heavily 

instruction  constructs18.  loaded  processing  sessions  as  would  be  In  a  join 

operation.  It  was  further  assumed  that  the 

In  order  to  arrive  at  a  decision  for  the  controllable  memory  array  (16  bit  wide)  could 

number  of  subcells  to  use.  various  simulation  deliver  data  with  up  to  a  600  K  Words/sec  rate, 

studies  were  carried  out'7'18.  Tuple  processing  The  results  of  these  experiments  are  provided  in 

times  were  sampled  from  two  exponential  Figure-6, 

distributions.  The  first  distribution  modeled 

processing  times  as  to  have  a  minimum  of  25  y  sec,  It  was  decided  that  k:4  would  be  a  cost- 

a  mean  of  125  y  secs  and  a  maximum  of  500  y  secs  effective  choice  to  reduce  hardware  complex  Ity 


(a)  Exponential  processing  time  distribution 


(b)  Exponential  processing  time  distribution  for  CROSS  MARK 
KWlUJ  yaec  HUM t  lOOO  yssc  MAX:  2000  usee 


figure  -  6  :flot  of  normalised  processing  titou  vs  k 
(  date  rate  as  parameter  ) 
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and  impose  practically  no  waits  '' ',r  the  average 
processing  times  at  the  memory  rate  of  300  K 
Words/sec  (  5  M  bits/sec)which  is  attainable  by 

the  current  MBH's. 

The  cell  design  utilizes  4  subcells  where 
each  subcell  contains  an  Intel-8086  microprocessor 
with  2  K  bytes  of  RAM  and  1  K  bytes  of  ROM  and 
some  additional  control  logic.  Total  chip  count 
per  subcell  is  20.  The  ceil  memory  interface  is 
configured  for  16  x  92  K  bit  M8M's  but  can  easily 
be  modified  for  other  types  of  MBM's  and/or  bulk 
RAM's  (The  reader,  although  not  implied  in  the 
paper,  should  not  be  disillusioned  by  the  fact 
that  other  types  of  bulk  serial  or  block 
addressable  memories  cannot  be  supported.  They  can 
be  with  the  exception  of  not  having  the  further 
performance  gains  achievable  by  the  controllability 
feature.  The  architecture  could  also  be 
conceptualized  as  havinq  a  bulk  RAM  memory  with  a 
single  microprocessor  similar  to  the  original  desiyi. 
However,  the  speed  to  be  imposed  on  a  single 
microprocessor  will  be  beyond  those  conjectured 
for  the  future  at  least  at  the  cost  effective 
scales.  Cost  of  RAM's  would  be  another  issue  which 
must  be  cheap  and  competitive  despite  their 
volatility).  The  total  chip  count  of  this 
configuration  Is  160  per  cell  which  is  slightly 
over  one  third  of  that  of  the  previous  designs, 

It  should  be  emphasized  that  utilization  of 
8086 's  is  a  specific  case  of  the  implementation  of 
the  proposed  architecture.  In  fact,  besides  the 
large  data  bandwidth,  only  the  powerful  string 
operation  instructions  and  a  suitable  subset  of  the 
remaining  general  purpose  Instructions  of  the  8086 
are  utilized  for  implementing  the  subcell  firmware, 

In  a  possible  large  scale  commercial  Implementation, 
a  special  purpose  microprocessor  with  only  the 
necessary  instructions  can  be  developed  and  utilized, 
Depending  on  the  cost  versus  speed  trade-offs,  it 
Is  also  possible  to  Implement  the  proposed 
architecture  with  powerful  8-bit  microprocessors 
having  fast  block  operations. 

In  memory,  the  CM  data  rates  can  be  as 
high  as  technology  permits,  For  example,  the  8086 
based  system  can  support  a  16  M  bit/sec  burst  data 
rate  for  low  to  medium  complexity  qualification 
terms  of  RAP  instructions  without  any  serious 
performance  degradation  due  to  the  utilization  of 
waits,  It  may  be  concluded  that,  it  is  the 
limitations  of  controllable  memories  (e.g.  MQM's) 
that  will  be  the  determining  factor  for  the 
terminal  speed  of  the  proposed  architecture, 

The  simulation  studies  and  analytical  ..idelinq 
of  the  cell  operation  show  that  considerable 
performance  Improvements  over  previous  RAP  designs 
can  be  attained.  It  has  been  observed  by 
simulation'8  that  the  new  processor  performs  3-6 
times  better  than  the  previous  designs  despite  the 
fact  that  a  larger  and  slower  memory  Is  being 
incorporated. 

The  join  operation,  which  has  not  been 
empnasized  (from  a  performance  point  of  view)  in 
other  database  machines,  can  be  performed  rather 


efficiently,  because  a  larger  number  of  values  can 
be  matched  during  each  circulation. 

Furthermore,  since  all  the  cell  status 
information  is  kept  by  a  microprocessor  at  the  cell 
interface,  task  switching  in.a. preemptive  resume 
multiprogramming  environment8*'5,  requires  no 
extra  hardware.  Relation  status  saving  and 
restoring  are  accomplished  by  the  two  new.BAP 
instructions  SAVF. -MARKS  and  RESTORE-MARKS'9 *zo 
which  save  and  restore  tuple  mark  bits  into  and 
from  special  domains  appended  to  the  end  of  each 
tuple  that  serve  as  a  push  down  stack  during  task 
switchings. 

The  overall  RAP  system  configuration  with  the 
new  processor  architecture  would  be  similar  to 
previous  RAP  configurations6*7,  only  that  the 
controller  for  the  cell  array,  which  is  currently 
being  designed,  is  expected  to  be  a  more 
intelligent  unit.  Its  main  functions  will  to  be  to 
keep  track  of  device  status  by  maintaining 
necessary  relation  and  cell  status  tables, 
instruction  scheduling  for  a  RAP  query  whose 
instructions  have  been  converted  to  microprocessor 
code,  data  buffering  in  join  operations,  control 
of  hardware  and  software  Iterative  Instructions, 
computation  of  overall  set  function  results  and 
communication  with  the  frontend  computer.  It  1s,5 
also  expected  to  do  the  functions  of  the  monitor  5 
for  the  RAP  multiprogramming  and  virtual  memory 
operations.  The  entire  cell-array  controller 
configuration  will  be  driven  by  a  conventional 
frontend  computer  to  interface  the  users. 

Conclusion 

After  a  survey  of  recent  database  machine 
proposals,  a  new  architecture  for  the  RAP  database 
machine's  cell  processor  is  presented.  The  new 
architecture  has  certain  advantages  over  the 
previous  hardwired  RAP  designs,  Mainly,  the 
hardware  complexity  is  decreased  while  the  opera¬ 
tional  flexibility  is  increased.  The  utilization 
of  LSI  components  opens  the  way  for  the  modularity 
of  the  architecture.  The  utilization  of 
controllable  memories  also  relieves  the  architec¬ 
ture  from  the  constraints  of  worst  case  timing 
requi rements . 

From  a  feature  comparison  point  of  view  the 
proposed  architecture  has  the  following  properties 
one  or  more  of  which  are  not  shared  by  the  other 
database  machines: 

a)  Data  qualifications  of  any  complexity  can 
be  evaluated  over  the  memory  contents  in  one 
circulation  of  the  memory. 

b)  All  kinds  of  updates  and  arithmetic 
operations  can  be  done  on  the  memory  contents 
without  transfering  data  in  and  out  of  the  RAP 
system. 

c)  Join  operation  is  handled  In  a  very 
efficient  manner.  In  most  of  the  typical  cases, 
one  target  relation  cell  memory  circulation  may 
suffice  to  process  the  values  of  one  source 
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relation  cell,  compared  to  the  large  nunter  of 
circulations  (or  revolutions)  required  In  the 
other  database  machine  proposals. 

d)  Since  no  software  access  methods  arc 
utilized,  no  overhead  on  the  frontend  computer  is 

imposed. 

e)  A  multiprogramming  environment  can  be 
attained  without  any  extra  hardware. 

f)  It  Is  expected  that  a  single  RAP  database 
machine  Is  going  to  be  confined  within  certain 
practical  physical  limits.  In  order  to  support 
very  large  database  applications,  either  one  or 
combination  of  the  following  two  system 
configurations  can  be  Incorporated: 

c  \  A 

1)  Virtual  memory  back  up  as  In  ’  for  a 
single  processor 

?)  The  database  can  be  distributed  in  a 
network  of  RAP  database  machines  and  a  given 
database  operation  can  be  decomposed  and  executed 
on  the  network  of  modest  size  RAP's  concurrently, 
as  shown  by  a  previous  study22. 

RAP. 3  prototype  Implementation,  along  with  Its 
already  operational  software, Is  nearing  completion 
at  the  MEHJ . 


plus  the  capacities  of  the  (k-1) 
subcell  buffers). 

TUAIT  s  total  time  during  which  the  cell 
memory  Is  in  the  wait  state  In  a 
circulation . 


Then  we  have  the  following  relationships: 

1-1 

-  <k  2)*T.-  +  l  w.;(1«l . NT 

Ls  j.1-(k-2)  j 

j  >  k-2  and  wQ.0) 


T, 


‘WAIT 


‘REST 

TT0TAL 


whert>  w.  are  the  wait  times 

associated  with  tuple,  (ref .Figure  4). 
NT  J 

l 

Jml  J 


-  (k-1)  T 


LS 


-  nt*tls  +  twmt+trest 
■  (NT+k-1)  *  tls  +  TwAIT 


It  should  be  noted  that  Ty^jj  Is  dependent 
on  the  complexity  of  the  query  routine  (if  k  and 

tbit  >r •  but  an  uPPer  bound  on  "'total  c4n 

we  derived  as  follows: 
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Appendix  1 

Timing  Analysis  of  Cell  Operations 


The  following  analysis  describes  the 
relationships  among  certain  timing  parameters. 

Let 

TBn  =  CM  bit  time  -  CM  shift  tlms/16 

U'PLEN  s  length  of  a  tuple  In  bits 

k  =  number  of  subcells/cell  (k>3  because 
of  the  data  move  strategy  incorpo¬ 
rated) 

T.  Q  i  time  to  load  (store)  a  tuple  via 

LS  DMA  -  TBIT*HJPLEN 

Twl  =  available  time  to  process  tuple  1 

NT  '  =  number  of  tuples  In  CM 

fTnT4i  =  total  circulation  time  of  CM  from 
the  start  of  loading  of  the  first 
tuple  to  the  end  of  storing  of  the 
last  tuple. 

l„r„T  =  extra  time  needed  to  restore  the  last 
(k-1)  tuples.  (It  should  be  noted 
that  CM  circulation  Is  completed  only 
after  the  last  tuple  Is  restored. 

Some  extra  time  Is  needed  to  restore 
the  last  (k-1)  tuples  because  the 
total  dynamic  capacity  of  the  cell 
memory  Is  equal  to  the  CM  capacity 


Assume  that  all  tuples  require  exactly  L  times 
the  time  allowed  by  the  architecture  l.e.: 

TREQ  -  L*(k-2)*TL$  i  (L  >  1) 

Assuming  also  that  mod  (NT,k)  ■  0,  then 
during  the  circulation.  NT /k  tuples  will  be 
processed  by  each  subcell.  The  time  to  handle  a 
tuple  Is: 

TTUPLE  “  W2*TLS 

where  the  last  term  accounts  for  the  load  and  store 
times. 

Since  processing  of  the  tuples  are  overlapped 
over  the  k  subcells,  the  total  time  for  a 
circulation  will  be: 

TT0TAL-  (NT/M  Mtuple+ (k-1  )*Tls 
+  (L-l)  *  (k-2)  *Tl$ 

where  the  first  term  Is  the  time  to  process  NT 
tuples  with  k  subcells  In  parallel,  the  second 
term  is  the  time  to  restore  the  (k-1)  tuples  at 
the  end  of  the  circulation  and  the  third  term  is 
the  Initial  extra  time  (beyond  the  allocated  time) 
required  by  subcell,  for  tuple,.  Inserting 
T TUPLE  9*ves  on  uPP*r  Pound  for  the  circulation 

time  when  each  tuple  requires  L  times  the 
allocated  time,  as: 

T TOTAL  ’  ( ( NT/k )*( 2  +  L  *  (k-2)) 

+  L*(k-2)  +  l)*TL$ 
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Anuemlix  ? 

Analysis  of  the  Join  One  rat  ion  Co  ♦  orman;  •• 

The  time  to  process  one  source  •-.*  1 1*  i'-ti  •el; 
contents  can  be  aooroximated  as: 

T  T  ,  |NS|  *T 

'JOIN  “  tBUF  +  1  n  1  'TOTAL 

where  the  first  term  Is  the  time  to  read  and  buffer 
the  source  cell  values  (1  CM  circulation)  and  the 
second  term  Is  the  time  to  process  the  NS  buffered 
source  values  and  represents  [NS/n]  circulations 
(l.o.,  n  values  are  passed  in  each  circulation) 
of  the  target  relation  cells. 

If  n,  is  the  number  of  source  values  that  can 
be  processed  In  one  target  relation  cell  memory 
circulation  without  Imposing  any  waits  (L  *  1),  then 
the  total  circulation  time  for  this  case  will  be 
(ref.  Appendix  1): 

TT0TAL, nowait  "  (NT  +  k  -  1 )  *  TLS 
while  the  total  number  of  such  circulations  will  be 
f  NS/n ,1  . 


Neglecting  'he  terms  k-1  and  m(k-2)+l  in  the 
:.iM  two  equation',,  which  are  much  less  than  the 
(.m-i-espondinq  terms  we  can  write: 

W_wait  _  . 

TT0TAL,  nowait  NT 

?+m(k-2) 

s - <  m  for  m  >  1 

k 

It.  can  be  observed  that  imposing  waits  on  the 
CM  by  feeding  in  more  source  values  per  circulate 
reduce  the  number  of  circulations  by  a  factor  of 
1/m  while  the  increase  in  the  total  time  of  each 
such  circulation  is  less  than  m-fold,  hence  the 
overall  time  to  process  the  buffered  source  values 
is  reduced. 

The  actual  execution  time  In  reality  will  be 
much  less  than  the  above  derived  bounds  because 
of  the  fact  that  after  each  target  relation  scan, 
the  number  of  target  values  not  yet  selected  and 
hence  will  impose  waits,  will  diminish  at  an 
increasing  rate  until  the  last  target  relation 
scan. 


If  m*n,  source  values  are  processed  in  one 
target  relatiin  cell  memory  circulation,  then  L 
will  be  roughly  m,  and  the  total  circulation  time 
will  be: 

ttotAL,  wait  ■  ««/►>(*+■(»•*)> 

+  m  (k-?)  ,  1)  *Tls 

while  the  total  number  of  such  circulations  will 
be  f NS/ (m  *  n1 )  1- 


The  current  design  employs  four  subcells 
(k«4)  and  assumes  that  CM  shifts  at  300  kHz  giving 
a  TB1T  of  208  nsec/bit;  then  for  i  Kbit  target 


relation  tuples,  the  allocated  time  Is  426  psecs. 
Within  this  time,  the  INTEL  8086  routine 
developed  to  perform  equl-joln  on  2  byte  numeric 
domains  can  process  100  source  values  without 
imposing  any  waits.  Processing  400  such  source 
values  gives  L*4.  Since  a  target  relation  value 
may  qualify  for  the  join  before  the  whole  source 


QUERYRTN 

LEA 

BP, MASKD 

/  check  if  tuple  is  deleted 

CALL 

MKED 

/  previously; 

JB 

NOTQUAL 

/  exit  if  deleted; 

LEA 

BP .MAKT4 

/  check  if  tuple  is  T4  marked 

CALL 

MKED 

/  previously; 

'  MKED  (T4) 

? 

JNB 

NOTQUAL 

/  exit  if  not  14  marked; 

LEA 

BP.PBl 

/  set  pointer  to  parameter  block  1; 

CALL 

C0MPNUM2 

/  call  numeric  comparison  routine; 

SALARY  2000 

? 

JNB 

NOTQUAL 

/  exit  if  comparison  fails; 

LEA 

BP.PB2 

/  set  pointer  to  parameter  block  2; 

CALL 

COMPLITR 

/  call  literal  comparison  routine; 

DEPT  .  'SHOE' 

? 

JNB 

NOTQUAL 

/  exit  if  comparison  fails; 

LEA 

BP.PB3 

/  set  pointer  to  parameter  block  3; 

Ann  Ann  m  sai  arv 

CALL 

ADD2 

/  tuple  is  qualified,  update  it 

MVV  WWW  >  W  W" 

NOTQUAL 

JMP 

WAIT 

/  and  wait  until  next  tuple; 

MASKD 

DC 

X 1 8000 1 

/  mask  for  deleted  tuples; 

MASKT4 

DC 

X 1 0800 1 

/  T4  marked  mask; 

PB1 

DC 

A(TUPLE-fSALARY)  /  address  of  SALARY  domain  in  buffer; 

DC 

H  *  ZOO ' 

/  external  comparand; 

DC 

H  ‘  4 1 

/  comparison  mode  for  "Greater  than"; 

PB2 

:  DC 

A(  TU  PLE+OEPT ) 

/  address  of  DEPT  domain  in  buffer; 

DC 

H '  8  ‘ 

/  length  of  the  domain; 

DC 

H '  2  ’ 

/  comparison  mode  for  "equal  to"; 

DC 

C ' SHOE ' 

/  external  comparand; 

PB3 

:  DC 

A(TUPLE+SALARY) 

/ 

DC 

H 1 500 1 

/  external  value  to  be  added. 

Figore-7  Intel  8086  Program  for  a  PAP  Instruction 
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value  block  is  scanned,  the  actual  average  total 
circulation  time  will  be  considerably  less  than 
what  was  found  in  the  above  analysis  for  the  worst 
case  assumptions. 

Appendix  3 

Query  Routine  Example 


SUM  :  Selects 

COUNT  :  Selects 

MAX  :  Selects 

MIN  :  Selects 

AVERAGE  :  Selects 


and  accumulates 
and  counts 

and  finds  the  maximum 
and  finds  the  minimum 
and  computes  average 


Insertion  and  deletion  commands:  Insert  and  delete 
record  occurrences. 


Consider  the  RAP  instruction: 

ADD  [EMP  (SALARY):  MKED  (T4)  & 

SALARY  >  2000  6. DEPT  .  'SHOE']  [500] 


DELETE  :  Selects  and  deletes  record 

occurrences  from  the  record  type 
INSERT  :  Inserts  record  occurrences  Into 

the  record  type 


which  adds  500  to  the  salaries  of  those  employees 
which  satisfy  the  accompanying  qualification 
expression.  It  is  assumed  that  SALARY  Is  a  2  byte 
numeric  domain  and  DEPT  is  an  8  byte  literal 
domain. 


The  query  routine  for  this  RAP  Instruction, 
in  INTEL  (1086  instruction  set,  can  be  given  as  in 

f  i  guru-  /. 

The  routines  MKED,  C0MPNUM2 ,  COMPLITR  and 
ADD2  reside  in  subcell  ROM  and  perform  mark  status 
tests,  value  comparisons  and  addition  updates  on 
the  domains  of  the  tuples  according  to  the 
Information  supplied  with  the  associated  parameter 
blocks.  Since  the  data  qualification  evaluation 
and  the  update  are  done  together,  this  Instruction 
would  take  only  one  cell  memory  circulation  to 
process  all  the  tuples  of  a  relation. 

It  should  be  noted  that.  It  Is  possible  to 
construct  query  routines  for  data  qualifications 
and/or  updates  of  any  complexity. 

Appendix  4 

Summary  of  the  Instruction  set  of  the  RAP 
HOMS  "Assembler  language 

Selection  and  retrieval  commands:  Implement 
selection  and/or  data  retrieval. 


MARK 

RESET 

READ 

CROSSJIARK 
I.RS  C0ND_MARK 
(11. 1.,riRST_MARK 

(if.  T  RRST 
SAVE 


:  Selects  and  tags 
:  Selects  and  removes  tags 
:  Selects  and  reads 
:  Maps  between  two  record  types 
:  Maps  between  two  record  types 
:  Cursor  and  mapping  within  a 
record  type 
:  Cursor 

:  Selects  and  saves  Item  In  RAP 
register 


Data  definition  commands:  Initialize,  populate, 
and  delete  a  record  type. 


RELATION  :  Defines  a  new  relation  (record 
type).  Size,  type,  length 
oarameters  for  the  data  are 
declared.  (Key  attributes  and 
access  paths  are  defined  If  the 
sofware  emulator  rather  than  the 
actual  machine  Is  used).  User 
capabilities,  access  rights,  and 
the  protection  parameters  are  also 

,  declared  with  the  use  of  this 

command. 

CREATE  :  Populates  the  database  for  the 

specific  record  types  which  have  been 
defined  by  the  RELATION  command. 

DESTROY  ;  Deletes  a  record  type 

System  commands: 


AUTHORIZE 

LOCK 

RELEASE 
SAVE  MARKS 

RESTORE_MARKS 

LOCATE 

MOVE 

STATUS 
READ  MARKS 


Grants  access  to  the  user  via  a 
password 

Specified  record  types  are  locked 
against  concurrent  accesses 
Releases  locks 

Current  mark  bits  of  specified 
relations  are  pushed  onto  stacks  of 
each  tuple 

Restores  marks  by  poping  the 
saved  mark  bits 

Returns  the  node  address  of  the 
relation  being  searched 
Moves  an  entire  or  restricted 
subset  of  a  relation  to  the 
specified  site 

Performs  dynamic  status  checking 
for  branching  purposes 
Same  as  READ,  but  output  Includes 
also  mark  bits 


Register  manipulation  commands: 


Update  commands : Perform  selection  and  In-place 
arithmetic  and  replacement  updates. 


ADI) 

SUB 

Mill. 

U1V 

REPLACE 


Iteml  +  Iteml  +  Item2 
Iteml  *•  Iteml  -  Item2 
Iteml  *■  Iteml  *  Item2 
Iteml  Iteml  /  Item2 
I  teml 1  tem2 


I  or  constant) 
or  constant) 
or  constant) 
or  constant) 


Statistical  (Set  function)  comnands:  Select  and 
compute  functions  in-place. 


READ_REG  :  Reads  out  RAP  registers 
ST0RE_REG  :  Enters  data  into  user  registers 
DEC_REC  :  Decrements  specified  register 

contents  by  one 

INC  REC  :  Increments  specified  register 

contents  by  one 

RADD.RSUB.RMUL.RDIV:  Perform  specified  arithmetic 
operations  on  registers  as: 
<reg>*  <reg>croprxoperand>  where 
ropr  Is  one  of  RADD.RSUB.RMUL ,  or 
RDIV . 
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Decision  and  transfer  commands:  Control  program 

loops . 

TEST  :  Tests  presence  of  tags  within  a 

record  type 

BC  :  Branch,  conditional  and  uncondi¬ 

tional 

EOQ  :  End-of-query 
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ABSTRACT 

A  direct-execution  model,  based  on  the  tree- 
structured  Internal  representation  of  the  source- 
texts  lias  been  defined.  It  features  a  single  Inter¬ 
mediate  environment  and  two  environment  transfers  : 
the  first  one  corresponds  to  a  bidirectional  trans¬ 
lation  between  the  source-text  and  the  tree-struc- 
turod  internal  form.  The  second  one  Is  a  conven¬ 
tional  microprogrammed  Interpretative  process  on  a 
specialised  hardware  architecture. 

In  this  paper,  a  full  description  of  a  hardware 
arcnl Lecture  which  directly  holds  the  tree-structu¬ 
red  forms  is  given.  Its  characteristic  features  are 
discussed  and  the  micro-control  operations  which 
deal  with  the  main  tree-structured  form  concepts 
(recursivl cy,  top-down  tree  traversing,  escapes) 
aru  presented. 


1  -  INTRODUCTION 

To  solve  the  problems  resulting  from  the  seman¬ 
tic  gap,  wliic.il  arise  In  the  conventional  computer 
systems,  new  computer  architectures  have  been  revea¬ 
led  these  last  few  years.  Their  purpose  Is  to  sup¬ 
port  directly  one  or  more  high  level  languages,  In 
hardware.  In  this  way,  eliminating  the  order-codes 
tends  to  close  the  gap  between  the  high  level  lan¬ 
guage  and  the  physical  structure  of  tne  host  machine. 

Although  the  Von  Neumann  architecture  Is  Increa¬ 
singly  and  rightly  questioned  none  of  the  proposed 
systems  of  high  level  language  processors  have  been 
traded  successfully.  We  tried  to  analyse  the  reasons 
of  these  failures**^  and  It  appears  that  the  attrac¬ 
tiveness  of  the  Von  Neumann  architecture  resides  in 
its  conceptual  simplicity,  whereas  the  suggested 
solut1onsa>4i5  are  characterized  by  complex  models, 
difficult  to  understand  and  to  Implement,  and  often 
leading  to  gas-works  architectures. 

Therefore,  we  nave  proposed  a  direct  execution 
scheme,  based  upon  the  definition  of  a  class  of  list- 
structured  Directly  Executable  Languages  (OELs), 
which  Is  derived  from  LISP®.  The  objective  of  this 
scnemu  is  to  provide  the  implementation  of  iiigh  level 
languages  with  a  systematic  support,  easy  to  under¬ 
stand,  and  to  use’. 


1.1.  The  3L -Miwdel 

A  direct  execution  scheme  with  a  single  level 
was  defined  1,e.  a  scheme  Including  only  one  Inter¬ 
mediate  environment  between  the  source-text  and  the 
uxecutionai  environment  (flg.l). 
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Flg.l  -  The  3L-model 

.  A  first  interactive  processor,  the  editor,  Is  res¬ 
ponsible  for  the  communication  between  the  external 
environment  (source-text)  and  the  Internal  environ¬ 
ment  (DEL). 

.  A  second  processor,  the  interpreter.  Is  responsi¬ 
ble  for  the  evaluation  of  the  Internal  form  through 
the  hardware  operators. 

.  The  3L-mach1ne  (M3L)  Is  the  physical  support  of  the 
3L -model.  Both  processors  are  microprogrammed  on 
M3L ,  with  a  high  level  microprogramming  language, 
specialized  In  the  expression  of  the  emulation  pro¬ 
cessing  :  the  Language  for  Emulation  (LEM). 

1.2.  The  3L-fonn 

The  choice  of  the  Intermediate  environment  deter¬ 
mines  the  direct  execution  scheme.  As  we  wished  to 


1 


3 

i 
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maintain  the  whole  semantics  of  the  source-text 
wnile  providing  the  Interpreter  with  an  easy  form 
to  handle,  we  chose  a  list-structured  Internal  form, 
based  upon  LISP  :  the  LISP-like-Languages  (3L).  The 
3L  form  is  prefixed  anH  fulTy  parenthetized. Although 
its  semantic  power  is  very  high,  its  syntax  is  abso¬ 
lutely  trivial  and  it  offers  a  great  systematization 
for  the  internal  representation  of  the  programs, 


The  3L  form  is  represented  within  the  memory 
by  a  binary  tree-structured  form.  This  form  is  tag¬ 
ged,  its  unit  is  the  pair-cell  : 

_ }i6 _ 16  8 


CAR 


COR 


OES 


the  CAR  field  generally  represents  a  left  pointer, 
the  CUK  field  a  right  pointer,  and  the  dES  field  .ji¬ 
ves  the  description  of  the  cell  content,  more  pre¬ 
cisely  for  the  representation  of  objects. 


y 


Example  :  Suppose  that  in  the  high  level  language 
we  have  the  operation  f(x,g(y)).  It  can  be  expres¬ 
sed  in  the  terms  of  the  symbolic  3L  form  as 
(fx(gy)),  and  within  the  pair-cell  memory  : 


MICROINSTRUCTION 


I-  i -  AKCII1TI.CTURI:  OR  M31. 
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2  -  THE  GENERAL  STRUCTURE  OF  M3L 

The  M3L  project  started  with  a  systematic  study 
uf  the  interpretation  of  LISP.  First,  we  defined  a 
pseudo-machine,  then  we  wrote  a  simulator,  and  de¬ 
veloped  a  microprogrammed  LISP  interpreter  upon  it. 
The  simulation  measures*  opened  up  on  a  new  archi¬ 
tecture,  wnich  was  defined  for  the  M3L  prototype, 
presently  in  the  achievement  phase. 

2,1,  Synoptic  of  M3L 

The  general  organisation  of  the  3L-machine  is 
wry  simple,  The  resources  are  interconnected  via  a 
single  bus  which  determines  the  datapath,  The  data- 
patn  is  lb. bit  wide,  being  the  maximal  size  of  the 
prototype  pair-cells  memory,  (fig. 2) 

In  the  3L-machine  there  are  four  categories  of 
registers  : 


.  The  arithmetical  and  logical  unit  (ALU) 

The  ALU  of  MdL  is  built  from  four  AM  2903  LSI 
chips.  Owing  to  the  use  of  an  arithmetical  proces¬ 
sor,  its  task  is  very  small  :  it  has  to  manaae  the 
Ai  registers,  and  it  performs  data  comparisons  which 
are  typical  tasks  of  the  environment  transfers. 

.  Inputs/outputs 

The  Inputs'/outputs  system  Is  built  from  a  8 
bit  wide  peripheral  minibus  on  which  the  Interface 
adaptators  for  asynchronal  comnuni cations  are  con¬ 
nected.  These  chips  perform  the  standard  control 
functions  according  to  the  CCITT  V24  Standard.  The 
minimal  version  of  M3L  includes  an  ACIA  for  driving 
the  TTY,  and  another  for  interacting  with  a  micro¬ 
system,  responsible  for  the  management  of  inputs/ 
outputs  and  disk-files. 


Aj  registers  1tL0,15j 

they  aru  used  for  current  works  and  Information 
transfers  between  microprocedures 

11,  registers  1e|)0,2553 

they  serve  as  global  registers  for  every  micro¬ 
procedure,  they  contain  the  descriptors  of  the 
current  emulated  system 

T ,  registers  i e L °, 3 1 H 

they  are  flip-flops  which  give  the  status  of  the 
system.  They  are  global  resources  and  some  of 
them  can  be  set  or  reset  by  the  programmer 

l< ,  registers  tei;0,3!) 

they  make  the  recursivity  In  LEM  possible  by  tne 
use  of  their  locality. 

2,2.  The  numerical  processing 

In  the  Von  Neumann  architecture,  the  numerical 
processing  Is  prevalent.  It  is  represented  by  the 
central  operator  and  the  inputs/outputs.  More  and 
more,  it  is  Integrated,  especially  In  the  microsys¬ 
tems.  On  the  contrary,  in  a  high  level  language 
processor  the  non-numerical  processing  is  prevalent. 
If.  is  true  for  M3L  where  the  architecture  is  desi¬ 
gned  according  to  the  emulation  processing.  Of  course 
it  is  yet  necessary  to  Incorporate  the  elements  of 
the  numerical  processing  within  this  architecture. 
Nevertheless,  they  take  a  marginal  place  in  M3L  and 
they  are  entirely  supported  by  a  single  LSI  family 
(AMO  2900), 

■  Tne  arithmetical  processor 
‘  ""Most  of  the  arithmetical  functions  of  the  3L 
machine  are  performed  by  a  monolithic  processor 
(AM  9511).  This  processor  relieves  the  machine  of 
all  the  corresponding  micro-software  of  mean  impor¬ 
tance  for  emulation.  It  can  be  viewed  as  a  periphe¬ 
ral  of  M3L ,  It  runs  in  parallel,  and  it  is  inter¬ 
faced  by  the  general  bus,  The  main  operations  per¬ 
formed  by  tne  AM  9511  are  : 

-  18  data  manipulation  operations  :  conversions 
fixed-float,  read,  write, 

-  5  fixed  arithmetical  operations  (IGand 32 bits) 

-  4  float  arithmetical  operations  (32  bits)  : 

-  11  secondary  operations  (32  bits  float)  : 

sin  ,  cos,  xy  ,  , 
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Fig. 3  -  THE  PERIPHERAL  MINIBUS 


3  -  THE  MICRQCONTROL 


Microprograms,  written  in  LEM,  are  compiled  to 
produce  fixed  microcode.  Vertical  microprogramming 
used  for  this  Implementation  results  in  two  advan¬ 
tages  :  the  effort  of  the  compiler  is  less  Important 
and  the  site  of  microinstructions  can  be  shortened. 
This  reduces  the  amount  of  microcode  to  swap  during 
control  switches. 

The  great  diversity  of  control  signals  to  pro¬ 
vide  (in  particular,  to  control  the  tri-state  bus) 
nas  led  to  a  two  leveled  microprogramming.  The  method 
used  here  is  different  from  the  nanoprogramming  of 
QH.l**  which  uses  a  second  level  of  microprogramming. 
To  execute  a  microinstruction  through  the  datapath 
one  must  : 

1.  provide  some  parameters  : 

-  number  of  Ai  ,  ,  R}  ... 

-  long,  short  constant 

-  numoer  code  of  branch  operation,  of 
ALU  function  . . . 

2,  define  an  action  to  execute,  i.e.  to  state  a 

particular  data  transfer  through  the  datapath. 

The  second  part,  fixed  for  a  jiven  action, still 
requires  much  more  bits  for  the  direct  control  of 
gates.  The  repetition  of  such  a  long  "dead-bit"  se¬ 
quence  is  cumbersome,  thus,  the  action  to  be  execu¬ 
ted  is  specified  by  the  second  level  of  micropro¬ 
gramming,  in  a  single  horizontal  word  where  each 
control  bit  drives  directly  the  gates  :  it  is  the 
executive. 

The  format  of  a  fix-sized  microinstruction  is 
then  : 


OPd  represents  the  code  number  of  an  executive,  and 
the  Pi's  are  the  arguments. 

The  size  of  the  microinstructions  is  32  bits. 
To  tne  operation  code  (opc)  can  correspond  up  to 
256  executives.  Theoretically,  a  great  number  of 
executives  can  be  defined  but  practically  the  faci¬ 
lities  of  a  datapath  are  never  completely  put  on 
use  :  our  simulation  of  a  LISP  systeml  required  60 
executives  only.  The  executives  reside  in  a  fast 
PROM  memory  (tft»35  ns)  with  256  words  of  116-bit 
length. 

J . 2 .  Description  of  the  microcontrol  word s 

•  The  microinstruction  parameters 

There  are  10  aval  Table  pi  parameters.  A  micro¬ 
instruction  is  an  assembling  of  some  of  these  para¬ 
meters. The  assemoly  rules  are  stated  by  each  para¬ 
meter  place  within  the  24-bit  parameter  field. 


i 


The  typical  formats  are  : 

23  7  0 


Three  different  places  are  available  for 
the  Ai  registers 


Four  different  places  are  available  for 
the  Ri  registers 


23  15 _ a 

□ 

Bi 

CO 

/t 

// 

- J- 

•  fi 

Bi  registers 


The  Tj  registers  are  associated  with  the 
BRanch  field 


ISC  is  the  escape  tag  and  IND  specifies  the  stop 
mode  for  the  return  on  escape  condition  :<,=,> 


•  Inc  executive  word 

The"  executive  is  divided  into  14  sub-fields  which 
can  be,  or  not,  attached  to  a  particular  control 
task  upon  the  datapath.  The  size  of  the  following 
sub-fields  is  illustrated  below. 


field  name 

CONTROL  OF 

uPC 

microprogram  counter 

OES 

MPX  (Shift  and  Mask) 

STK 

Stack  memory 

MSLl 

Memories  selection 

ALU 

AMO  2903  ALU 

sA,B,i  .J 

source  selection  for  the  general 
bus  transfers 

I!A,B,L,l) 

receptor  selection  for  the  general 
Pus  transfers 

(M.C.D  specify  the  four  different  transfer  inodes, 
via  the  general  bus) 


3.3.  operating 

The  cycle  time  of  the  M3L  microinstructions  is 
fixed  to  500  ns.  It  may  seem  to  be  long  for  a  modern 
technology  but  with  regard  to  the  power  of  microins¬ 
tructions  it  is  a  good  speed.  The  cycle  starts  with 
the  fetch  of  tne  microinstruction  (100  ns),  it  in¬ 
cludes  some  register  moves,  and  always  a  main  con¬ 
trol  phase  which  is  200  ns  long. 

As  the  case  may  be,  this  phase  performs  : 

-  an  access  to  the  pair-cells  memory 

-  an  arithmetical  operation  on  the  ALU 

-  a  context  switch  with  an  access  to  the  stack  memory 
••  a  refresh  cycle. 


A  suspension  is  a  request  for  a  temporary  halt 
of  the  current  microprogram.  During  this  halt  a  sin¬ 
gle  microinstruction  is  performed.  The  suspension 
takes  place  when  the  latch  is  loaded  :  an  encoder 
detects  the  suspension  and  yields  its  number.  As 
there  are  8  different  suspensions,  the  8  first  exe¬ 
cutives  will  therefore  be  regarded  as  suspension 
handlers. 


One  of  these  suspensions  will  be  the  refresh 
request  for  the  dynamic  MOS  memory.  The  aim  of  this 
suspension  is  to  perform  a  refresh  cycle  without 
modifying  the  current  context. 


3.5,  Interrupts  and  microinstruction  tracing 


Another  suspension  will  be  associated  with  the 
interrupt  request,  It  has  to  save  the  current  con¬ 
text  without  changing  the  microprogram  counter 
( pPC ) ,  also  it  has  to  branch  to  the  Interrupt  hand¬ 
ler.  At  the  hardware  level,  the  management  of  in¬ 
terrupts  is  achieved  with  the  help  of  two  interrupts 
controlers  (AH  2914)  which  allow  the  handling  of 
16  interrupts  levels: 


To  each  microinstruction  word,  a  tracing  byte 
is  cincatena ted , where  each  bit  is  associated  with  a 
microsoftware  interrupt.  The  bits  are  setted  at  the 
compiling  stage.  Thus,  when  running,  tticy  activate 
the  corresponding  tr j  Interrupts  which  then  are  held 
sequentially,  according  to  their  priority  level. 

They  can  be  enabled  or  disabled  in  software. They 
are  used  in  microprograms  debugging  and  for  the  M3L 
prototype  measurement. 

4  -  THE  PAIR-CELLS  MEMORY 

The  pair-cells  memory  is  the  main  resource  of 
H3L,  It  is  built  with  dynamic  MOS  memory.  Each  chip 
contains  16  k  -  1  bits  and  its  access  time  is  150  ns. 
The  pair-cells  memory  is  organized  in  40-bit  wide 
words  which  are  divided  in  three  fields  having  each 
one  16  ,  16  and  8  bits,  ' 


Fig. 4  -  THE  TWO-LEVELEO  MICROPROGRAMMING 
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Data  moving  in  the  write  mode 


B  P  M  *-HSK 
(WRITE) 


BPM  «-HSK 
(READ)  •"SHT 


THE  PAIR-CELLS  MEMORY 


Tne  access  to  the  pair-cells  memory,  in  the 
read/write  mode,  It  done  through  the  general  bus. 
With  respect  to  data  moving  there  are  two  kinds  of 
access  in  the  raad  mode  and  one  in  the  write  mode. 
As  for  control  there  are  three  kinds  of  access. 

4.1.  Access  to  the  pointer  field 

•  Data  moving  in  the  read  mode 
.  singlf  trailer  :  The  LEM  syntax  is 


The  first  register  contains  the  address  of  the 
memory  cell  to  be  modified,  and  the  second  one  con¬ 
tains  the  information  to  be  moved. 


.  Access  in  the  control  mode 


D$ks 


ACTIONS 

LIST 


The  register  contains  the  address  of  the 
referenced  to  memory  word.  After  reading,  the  con¬ 
tent  of  the  corresponding  Fi  field  is  added  to  the 
microinstruction  address  register.  In  most  situa¬ 
tions,  the  access  In  the  control  mode  concerns  the 
descriptor  field  of  the  memory  cell.  Hence,  this 
multiple  branch  operation  enablas  the  3L  form  to  be 
decoded.  More  details  on  this  microinstruction  are 
given  In  the  section  S.3. 

4.2.  Access  to  the  descriptor  field 

Whereas  the  access  to  the  pointer  fields  (FO, 

FI)  Is  fixed,  the  access  to  the  descriptor  field 
(F2)  Is  more  versatile.  As  a  matter  of  fact,  for  a 
given  emulated  system,,  this  Held  can  arbitrarily 
be  divided  Into  contiguous,  or  superposed  sub-fleldr. 
These  sub-fields  can  be  accessed  to  In  the  read/ 
write,  or  control  mode,  like  the  pointer  fields. 

Ensuring  the  access  to  a  sub-field  of  DES  needs 
a  special  device  to  select  the  field.  This  device 
was  discussed  In  a  more  general  situation?.  Here  It 
Is  applied  to  a  byte  only,  thui.1t  Is  very  simple. 
There  1$  a  mechanism  for  the, mailing  operation, and 
another  mechanism,  strictly  tMrtflC,  for  the  writ- 
ting  operation’.  Therefore  we  «W2L ‘only  describe  the 
fetch  mechanism. 


DESCRIPTOR  MEMORY 


the  first  register  specifies  tnereceiver  and  the 
second  one  contains  the  address  of  the  emetter. 

Example  :  A2  *  F1(R3)  means  : 

"read  the  FI  field  of  the  pair-cell,  which  address 
is  stated  in  the  R3  register,  and  store  it  into 
the  A2  register" 

when  compiled  it  yields  the  following  microinstruc- 

ti0fLi - 2— - 2JS - il .  § _ 

PC-READl  y/y-,  j  R|  A}  |  T-j  T  BR 

•  ^9y&is_icsDSfisc 

~~^EEicfl  v|  r{  -* 

Tne  first  register  contains  cne  address  of  the  emet¬ 
ter  whereas  the  second  and  the  third  registers  deal 
wiu>  the  receivers  of  the  fields  FO  and  FI. 

feten  A1  into  A2  and  R3  is  equal  to  [  ”  * 

It  yields  : 

Jj _ 23  20  17  IB  11  5 

PC-REA02  I  Ai  I  ! R i I  Aj  T;  H  BR 


GENERAL  BUS 

Fig.b  -  PRINCIPLE  OF  THE  DES  ACCESSING  MECHANISM 


A  first  logical  level,  MUT,  performs  a  circular 
shift  on  the  descriptor  byte.  This  shift  is  perfor¬ 
med  in  a  purely  combinatory  an  parallel  manner  by  a 
special  chip  ($GN  0243).  A  second  logical  level  masks 


the  irrelevant  part  of  the  descriptor  byte.  The  se¬ 
lection  of  a  field  requires  the  specification  of  a 
shift  (0-7)  and  a  Mask  (e  byte).  These  informations 
are  included  into  the  executive  of  the  microinstruc¬ 
tion  wnich  fetches  the  sub-field. 


Pi  takes  its  input  arguments  into  the  Ai  regis¬ 
ters  and  outputs  its  results  to  ?z  via  the  Ah's. The 
object  of  the  Ri  registers  is  to  maintain  the  value 
of  \\  registers  in  the  environment  of  Pi,  this  value 
does  not  have  to  be  erased  by  the  application  of  n- 


Example  : 


DESCRIPTOR  SUB-FIELDS 


MASK 

#  FF 

#  07 

#  OF 

#  01 


SHIFT 

0 


0 


4 


3 


Tne  combinatory  nature  of  the  select  mechanism 
of  the  descriptor  sub-fields  enables  the  M3L  “me¬ 
mory  word"  to  be  viewed  es  e  sequence  of  fields 
Fi»o,n  •  which  are  equally  accessible  in  tha  raad, 
write,  or  control  mode.  In  a  single  microinstruc¬ 
tion  cycle.  This  emphazises  tha  thorough  attention 
wnich  was  paid  to  the  access  to  the  intermediate 
environment  on  M3L. 


5  -  THE  CONTROL  UNIT 

Beyond  the  special  organization  of  the  main  me¬ 
mory,  the  second  feature  of  the  M3L  architecture 
concerns  its  control  unit.  As  a  matter  of  fact,  It 
has  to  support  the  recursivity  mechanism  which  Is  a 
fundamental  aspect  of  the  emulation  functions.  The 
LEM  language  is  recursive  and  this  Is  conveyed 
through  the  hardware  structure  at  the  level  of  the 
control  unit  of  M3L. 

A  LEM  module  Is  composed  of  little  procedures 
which  are  independent  and  not  ordered,  They  can 
refer  to  each  other  and  even  to  themselves.  In  con¬ 
trol  switching  from  a  mlcro^rocedure  to  another,  A< 
global  registers  are  used  for  parameter  passing  ana 
Ri  local  registers  are  automatically  saved. 


To  the  recursivity  an  automatic  escape  mechanism 
is  added.  Uritting  the  top/down  recursive  parsers 
requires  such  devices  wich  are  similar  to  software 
interrupts  (like  "ON  conditions"  of  PL/1). 

An  escape  microinstruction  performs  a  return 
operation  to  the  last  call  microinstruction  which 
has  set,  in  the  recursivity  stack,  a  tag  number 
(Ei  constant)  equal  to  the  tag  number  of  the  escape 
microinstruction.  Escapes  and  recursivity  are  two 
concepts  which  are  closely  related,  hence  they  have 
been  merged  in  order  to  offer  a  better  systematiza¬ 
tion  of  the  control  transfer  between  microprocedures. 
It  Is  thus  stated  that,  in  LEM,  calls  are  recursive 
and  returns  are  escapes. 

The  control  unit  is  illustrated  in  the  fig. 7. 

Its  main  components  are  : 

ESC  stack  :  enables  the  escape  number  to  be  saved 
when  a  recursive  call  occurs 

COMP  :  when  an  escape  microinstruction  is  per¬ 
formed,  it  indicates  if  the  escape  num¬ 
ber,  given  as  an  argument,  corresponds 
to  the  escape  number,  that  is  read  into 
the  ESC  stack 

MPX3  :  32x1  multiplexer.  The  selection  Is  made 
according  to  the  Tf  nuqbpr,  passed  as 
an  argument,  meanwhile  T  allows  the 
output  to  be,  or  not,  Inverted 

uPC  stack  :  is  the  saving  stack  for  the  microins¬ 
truction  address  register 

ADDER  :  is  a  simple  adder  to  perform  relative 
branches 

MPX1.MPX2  :  are  the  input  multiplexers  of  the  micro¬ 
instruction  address  register 

uPC  :  is  the  microinstruction  address  register 

yPC  control  :  produces  the  control  signals  which 

correspond  to  the  operation  code  of  the 
current  microinstruction. 

The  control  unit  microinstructions 

The  five  basic  microinstructions  dealing  with 
the  sequencing  of  the  microprograms  are  :  tne  con¬ 
tinuation,  the  conditional  branch,  the  multiple 
branch,  the  recursive  call  and  the  escape. 

1.  Continuation 


CONTINUATION 


Subroutine 
address  (SB) 


£2 


uPC  *  SB 


Owing  to  the  continuation  microinstruction,  it  is 
possible  to  perform  branches  between  the  micropro¬ 
cedures  without  any  push  operation. 
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2.  Conditional  branch 


if  (T  =  1  and  T.  =  1)  or  (T  =  0  and  Ti  =  0) 


then  pPC  -  uPC  +  BR  +  1 
else  yPC  pPC  +  1 

The  deplacement  BR  is  signed.  The  signe  bit  is  in 
tiio  most  significant  position. 

3.  Multiple  branch 


pPC  v  yPt;+  1  +  F1  (Aj )  (la  2) 

This  microinstruction  enables  a  decoding  starting 
from  a  sub-field  (Ff)  of  the  descriptor. 


4.  Can 


v 

*  iiPC  <■  pPC  +  1 

*  the  current  context  is  saved  into  the  stack 

Ri  stack  4-  in, Q  3 

ESC  stack  4-  j'sc 

uPC  stack  +  uPC 

*  IiPC  4  SB 
V 

5.  Escape 


v 

*  Tne  context  Is  popped  from  the  stack  : 

Ri«0,3  *  IM  stack 

*  If  Ei  =■  ESC  stack  then  uPC  «•  |iPC  stack 

v 

The  escape  microinstruction  is  executed  as  many  ti¬ 
mes  as  necessary  until  finding  an  escape  number  cor¬ 
responding  to  that,  specified  in  the  Ei  field.  It 
scans  the  control  unit  stack  in  search  of  its  cor¬ 
responding  context.  Hence,  it  generalizes  the  return 
mechanism, 


CONCLUSIONS 

The  first  remark  that  we  can  make  about  the 
i«IJL  architecture  is  related  to  the  numerical  proces¬ 
sing  ;  it  is  not  absent,  since  without  it  there 
would  not  be  any  execution,  but  it  takes  a  seconda¬ 
ry  place.  This  does  not  imply  that  M3L  Is  not  able 
to  perform  efficiently  this  kind  of  processing.  On 
tne  contrary,  owing  to  the  advanced  integration 
capabilities,  a  LSI  family  ensures,  alone,  the  func¬ 
tions  of  the  conventional  architecture  very  effi¬ 
ciently. 


Whereas  the  numerical  processing  can  be  easily 
integrated,  this  is  not  true  for  the  nan-numerical 
processing.  As  a  matter  of  fact,  it  deals  mostly 
with  the  organization  of  the  Information.  It  does 
not  need  any  special  processor  but  it  is  expressed 
through  the  distribution  of  the  resources  in  the 
computer  architecture.  On  M3L,  a  special  attention 
was  paid  to  the  organization  of  the  resources  and 
in  particular  to  the  memories  management  ;  the  M3L 
architecture  is  based  upon  two  memories  :  the  pair- 
cells  memory  and  the  stack  memory. 

The  M3L  project  started  in  September  1977.  The 
prototype,  drawn  during  1979,  is  presently  in  the 
achievement  phase  and  will  be  operational  in  June 
19d0.  The  complete  machine,  with  the  input/output 
Interfaces  for  the  connecting  of  the  TTY  and  disks 
management,  is  made  of  five  boards  following  the 
European  standards.  The  prototype  is  equipped  with 
a  64  K  pair-cells  memory  and  a  16  K  stack  memory, 
representing  70  percent  of  the  chips. 

Tlie  architecture  of  M3L  is  simple.  Just  like  the 
Von  Neumann  architecture,  it  varies  in  direct  ratio 
with  th:  size  of  the  memory.  Therefore,  it  can  serve 
as  a  basis  for  a  line  of  general  host  systems. The 
present  Implementation  corresponds  to  a  middle  need 
but  a  new  version  of  M3L,  with  a  virtual  pair-cells 
memory  is  studied  where  the  datapath  will  be  24-bit 
wide,  Just  like  the  Von  Neumann  architecture  it 
offers  a  systematic  approach  for  the  implementation 
of  the  direct  execution  scheme,  that  makes  it  easy 
to  understand  and  to  use.  Consequently,  it  bears  the 
required  features  for  a  large  diffusion,  From  that 
time  onwards,  there  is  no  doubt  that  such  an  archi¬ 
tecture,  and  more  generally  x-architectures,  will 
supersede  the  conventional  sequential  computer  sys¬ 
tems. 
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ABSTRACT 

This  paper  presents  a  methodology  of  definition 
of  a  high  level  machine  for  a  real  time  language. 
Pi.rst,  the  choice  of  an  indirect  execution  computer 
architecture  for  this  clusx  of  language  is  discussed. 

Apart  from  the  algorithmic  aspect  already  exami¬ 
ned  in  previous  realizations,  this  type  of  language 
cruates  problems  of  management  in  a  multi-task  en¬ 
vironment,  of  definition  of  the  concept  of  interrup¬ 
tion  on  u  high  level  machine  and  of  implanting  com¬ 
plex  systems  which  require  a  structured  conception. 

An  application  of  the  defined  methodology  is 
described  which  consists  of  the  definition  and  rea¬ 
lization  of  a  high  level  machine  for  the  LTR  lan¬ 
guage,  insisting  on  the  Implementation  of  problems 
specifically  linked  to  real  time, 

INTRODUCTION 

The  design  of  a  general-purpose  computer  usually 
precedes  the  design  of  the  software  tools  it  is  in¬ 
tended  to  support  ;  software  and  hardware  interfacing 
is  performed  by  instructions  in  the  machine  language 
managing  the  physical  resources  of  the  computer. The 
Implementation  of  a  high  level  language  on  a  general 
purpose  computer  calls  for,  therefore,  the  presence 
of  translators  which  produce  (compilers)  or  use 
(interpreters)  these  instructions, 

The  semantic  gap  between  the  external  form  of  u 
high  level  language  and  the  machine  language  infers 
very  complex,  expensive  translators  which  are  not 
necessarily  free  from  errors. 

In  tile  last  20  years  many  high  level  languages 
adapted  to  programmer's  needs  have  appeared  which 
have  been  implemented  with  the  help  of  compilers. 

At  present  a  large  number  of  high  level  langua¬ 
ges  exists  which  correspond  to  most  programming  needs. 

Tile  definition  of  a  data  processing  system  (com¬ 
puter  +  Language)  may,  therefore,  move  in  a  new  direc¬ 
tion  i  given  a  chosen  programming  language,  let  us 
define  a  computer  architecture  associated  with  this 

Language . 

Tills  approach  is  attractive  lor  two  fundamental 
reasons  : 

-  choice  of  the  language  which  best  expresses  the 
problems  to  be  dealt  with  (FORTRAN  for  scientific 
calculations,  COBOL  for  management  decisions, 

PASCAL  for  general  applications,  ...) 

-  the  efficiency  of  an  arclii  tecture  designed  speci¬ 
fically  to  support  the  language. 


In  recent  years,  many  studies  have  been  carried 
out  based  on  languages  which  are,  essentially,  algo¬ 
rithmic  (FORTRAN,  PASCAL,  EULER,  BASIC,  SYMB01 . ). 

The  study  described  in  this  paper  concerns  t he  de¬ 
finition  of  an  architecture  specialized  in  the  exe¬ 
cution  of  a  real-time  system.  The  fact  that  a  real 
time  application  is  taken  into  account  introduces 
some  specific  problems  ! 

-  the  programming  system  is  composed  of  very  nume¬ 
rous  (>  300)  interacting  programs  ;  therefore, on 
tile  one  hand,  there  is  an  extremely  large  volume 
of  source  programs  (in  the  region  of  several  hun¬ 
dreds  of  thousands  of  instructions)  and,  on  the 
other,  the  problems  of  synchronization  between  the 
different  tasks  are  crucial 

-  tank  switching  must  be  efficient  so  that  an  inter¬ 
nal  or  external  event  can  bo  enable  as  cpiirkly  as 
puss i hie 

-  the  computer  must  allow  separate  execution  of  t lie 
different  tasks  so  as  to  ensure  a  structuration 
of  tlie  appl  leaf  ion. 

I  -  INDIRECT  EXECUTION  ARCHITECTURE 

Indirect  execution  architecture,  is  made  up  of 
two  distinct  parts  : 

-  a  software  modulo,  which  produces  an  intermediate 
language  based  on  the  source  language 

-  a  hardware  module,  which  execute  this  intermediate 
language . 

The  crucial  point  of  this  approach  is  the  defi¬ 
nition  of  the  intermediate  language  (1ML)  which  must 
be  sufficiently  close  to  the  source  language  if  the 
compiler  is  to  remain  simple,  and  sufficiently  close 
to  the  hardware  If  t lie  execution  must  be  efficient. 

It  follows,  therefore,  that  there  cannot  he  an 
general  purpose  1ML  adapted  to  every  machine  langua¬ 
ge  and  architecture.  The  definition  of  such  an  archi¬ 
tecture  must,  therefore,  sturt  with  the  definition 
of  this  intermediate  level. 

Separate  module  compiling  thus  demands  existence 
of  a  linkage  editor  to  generate  an  executable  system. 

Two  solutions  may  be  envisaged  : 

-  tlie  edition  of  static  links  takes  up  the  concepts 
which  exist  on  conventional  machines  and  furnishes 
an  executable  module 

-  the  edition  of  dynamic  links  is  carried  out  at  the 
execution  time;  in  this  case,  when  tlie  resident 
system  meets  an  external  reference,  it  must  enter 
the  module  in  central  memory  and  start  the  execu¬ 
tion.  This  procedure,  which  includes  an  address 
computation,  is  time  costly. 
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Thu  ohoicu  between  these  two  techniques  depends 
on  the  source  language  organization  and  the  cons¬ 
traints  of  execution  time. 

Direct  execution  computer  architecture,  on  the 
other  hand,  can  support  the  execution  of  a  high  le¬ 
vel  language  without  any  change  of  the  original  text. 
This  approach  presents  many  advantages  (suppression 
of  all  the  software  system,  the  compiler,  the  lin¬ 
kage  editor,  the  loader  )  interactive  program  debug¬ 
gings^,?  )  for  a  certain  type  of  application!  this 
layout  seems  to  be  difficult  to  implement  for  com¬ 
plex  systems,  notably  for  multi-task  reul  time  sys¬ 
tems.  For  example  the  definition  of  interruptible 
points  in  such  a  layout  is  rather  delicate  :  an  in¬ 
terruption  can  be  enable  either  at  fixed  points  in 
the  execution  of  a  source  instruction  (at  the  begin¬ 
ning  or  al  the  end),  and,  in  this  euse,  Lhe  masking 
time  may  become  too  lung  to  comply  with  the  system 
specifications,  or  at  each  uualysed  token  and,  in 
this  case,  the  processor  context  may  become  too  vo¬ 
luminous  and  context  switching  inefficient. 

2  -  HIGH  LEVEL  ARCHITECTURE  FOR  A  REAL  TIME  LANGUAGE 

The  need  for  efficient  execution,  the  management 
of  a  multi-task  environment  and  the  complexity  of 
the  real  time  systems  involved  lead  to  the  choice 
of  an  indirect  execution  architecture  to  support 
the  execution  of  these  systems. 

This  methodology,  essentially  interpret ive .com¬ 
bines  the  advantages  of  the  compiling  and  interpre¬ 
tation  techniques. 

The  source  text  is  translated  into  a  coded  text, 
compact  and  syntactically  correct,  whose  execution 
may  be  restarted,  postponed  or  linked  with  other 
nodules. 

The  intermediate  text  is  interpreted  with  the 
help  of  microprogramming  techniques  on  a  data  path 
Adapted  to  its  interpretation. 

This  methodology  avoids  the  two  baaic  reproaches 
which  are  levelled  at  compilation  and  interpretation. 
The  compiling  phaee  is  simple,  since  it  does  not 
realize  code  generation  and  optimization  ua  in  clas¬ 
sic  compilers,  Moreover,  the  text  produced  is  inde¬ 
pendent  of  machine  resources  (memory,  registers,,,,) 
end  the  semantic*  of  the  instructions  ere  close  to 
the  source  language. 

The  interpretation  of  euch  a  language  level  may 
be  efficient  thanks  to  microprogramiing.  Classic 
programmed  interpreters  were  not  very  efficient  as 
they  were  in  the  central  memory  and  they  acted  on 
rudimentary  data  paths  (adders,  registers) , On  the 
other  hand,  a  microprogrammed  interpreter  is  in  con¬ 
trol  store  (with  an  access  time  about  10  times  fas¬ 
ter)  and  present  day  technology  allows  the  creation 
of  data  paths  better  adapted  to  interpretation, 

2.1.  Intermediate  machine  language 

The  compilation  phase  must  make  the  source  text 
directly  interpretable.  The  properties  of  these  DEL 
(Directly  Executable  Language)  have  been  largely 
defined  by  L.W.  HOEVEL?.  This  phase  comprises,  there¬ 
fore,  a  syntactic  and  semantic  analysis  of  the 
source  text,  symbol  processing,  processing  of  for¬ 
ward  references  and  labels  end  the  prefixing  (or 
poetfixing)  of  the  instructions.  This  processing 
may  be  defined  as  a  transfer  from  a  concrete  machine 
^source  text),  defined  by  a  concrete  grarosar,  to  an 
ebetract  machine  (DEL),  defined  by  an  abstract  gram¬ 
mar,  used  by  the  interpreter  to  execute  the  abstract 


The  form  of  the  IML  is  determined  by  the  nature 
of  the  language  ;  however,  some  characteristics  may 
be  singled  out.  The  transfer  of  the  source  program 
into  the  virtual  machine  brings  about  an  environmen¬ 
tal  change.  An  intermediate  environment  may  be  com¬ 
posed  of  three  types  of  space  ! 

-  program  Bpace 

-  descriptor  space 

-  data  space 

The  program  written  in  IML  is  a  finite  series  of 
binary  fields,  of  varying  length.  These  fields  are 
the  operation  codes,  operand  identifiers,  descriptor 
space  references  or  constants. 

The  descriptor  space  contains  all  the  semantic 
information  on  the  data,  and,  notably,  the  type  and 
the  access  mode  to  the  data  space. 

The  data  consist  of  information  of  varying  length. 
They  represent  arithmetic  values,  texts,  system  in¬ 
formation  (events,  semaphores)  or  procedural  para¬ 
meters. 

2.2.  Characterization  of  interpretation  processing 
Interpretation  processing  comprises  three types 
of  processing  : 

-  organic  processing  associated  with  the  management 
of  the  tasks  making  up  the  systsm  (activation- 
deactivation)  and  managing  the  machine  resources 

-  formal  processing  associated  with  an  execution  ' 
control  managing  the  execution  of  a  task 

-  effective  processing  associated  with  the  final  exe¬ 
cution  of  the  instructione. 

The  central  processing  unit  of  present-day  com¬ 
puters  are  defined  solely  to  the  execution  of  effec¬ 
tive  processing. 

A  high  level  architecture  must,  therefore,  be 
made  up  of  hardware  structures  in  order  to  support 
efficiently  formal  processing  and  organic  orocessing. 
These  structures  must  permit  e  description  of  program 
algorithms  at  a  macroscopic  level  ;  that  is,  at  tha 
level  of  the  algorithmic  logic. 

Effective  processing,  on  the  other  hand,  permits 
a  description  of  the  algorithms  at  a  microscopic 
leval  ;  that  is,  at  the  level  of  functions  realization. 
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.THE  LANGUAGE 

Language  (Langage  Temps  R6el)  is 


APPLICATION 

hicii'level~architecture  FOR  THE  LTR  LANGUAGE* 

LTR  is  a  can  years  old  real  time  language  whose 
application  are  now  implemented  on  classic  computers 
(MITRA,  IRIS,...)  through  the  intermediary  of  a  com¬ 
piler  which  produces  a  symbolic  text  which  must  be 
assembled  on  the  target  machine, 

This  implamentational  outline  is  not  vary  effi¬ 
cient  at  the  compiler  level  nor  at  the  code  genera¬ 
tion. 

On  the  other  hand,  this  language  is  complete 
enough  to  be  able  to  express  most  of  the  problems 
of  a  real  time  application,  Therefore  it  has  been 
chosen  by  several  departments  of  the  French  Defense 
Departrrfetat  for  writing  real  time  systems. 

The  problem  is  the  definition  of  a  machine  ar¬ 
chitecture  which  can  support  its  execution  efficien¬ 
tly.  We  shall,  therefore,  examine  an  indirect  execu¬ 
tion  computer  architecture  to  execute  LTR  even 
though  this  is  a  compiler  oriented  language, 

i',  PRESENTATION  OF  , 

LTR,  Real  Time  _  _ _ 

'illi  high  level  programming  language  destined  for  sys¬ 
tems  realization.  It  presents  a  highly  structured 
organization  shown  by  a  partition  into  ARTICLES  at 
$ie  highest  ievel.  A  LTR  ayatam.it  a  set  of  ARTICLES. 

1.1.  Types  of  articles 

Data  articles  are  of  three  types  : 
w  DATA  ARTICLE  i  data  shared  by  a  program  and  ita 
•!r  subroutines 

if!  GLOBAL  DATA  ARTICLE  i  date  common  to  the  system 
data  set 

SYSTEM  DATA  ARTICLE  t  data  specific  to  the  system 
environment, 

The  processing  articles  describe  the  algorithms 
concerning  tha  data  declared  in  the  data  articles 
in  the  processing  articles. 

ere  tiistit  types  of  processing  articles  t 

-  PROCEDURE  ARTICLE  i  corresponds  to  the  concepts 

of  subroutines  or  functions 

-  PROCESS  ARTICLE  ;  describes  a  process  running  in 

e  multi-task  context  (concept  of 
software  task) 

-  INTERRUPT  PROCEDURE  ARTICLE  :  describes  a  process, 

whose  execution  is  tied  to  the  in¬ 
terruption  system  (concept  of  tin 
immediate  task), 

1.2.  Structure  of  a  LTR  system 

Figure  I  describes  a  LTR  system  ;  the  separate 
compilation  of  a  task  may  be  carried  out,  the  com¬ 
pilation  unit  being  : 

<SYSTEL  DATA  ARTICLEXGLOBAL  DATA  ART ICLE>*< EXTERNAL 
GLOBAL  PROCEDURE>*<EXTERNAL  PROCESS>*<taak^ 

Program  procedures  may  be  called  only  by  those 
of  the  same  task. 

A  task  may  activate  another  taBk  and  take  back 
control  at  the  end  of  execution  (closed  cull)  or 
Lose  this  control  to  the  advantage  of  a  tusk  with 
higher  priority  (open  call). 

*  Tills  work  is  supported  by  the  Direction  des  Keeher- 
cliea  et  Etudes  Techniques  (ORET)  of  the  French  De¬ 
fense  Department,  at  the  department  of  Computer 
Science  of  the  Paul  Subutier  University  anil  L tie  De¬ 
partment  ot'  Computer  Engineering  (ONEIIA-CERT)  ol 
the  Centre  d'litudes  ul  de  Reclierelics  de  Toulouse. 


The  implemented  system  must  ensure  local  proce- 
dure  recutBivity  and  task  reentry. _ 


GLOBAL  ARTICLES 

SYSTEM  DATA 
GLOBAL  DATA 

{GLOBAL  PROCEDURES 


PROCESS 

PROCESS i 

DATA  ARTICLES 

»  1  » 

DATA  ARTICLES 

PROCEDURE  ARTICLE! 

PROCEDURE  ARTICLES 

ARTICLES  PROCESSES 


INIT  PROCESS 
DATA  ARTICLES 

PROCEDURE  ARTICLES 


ARTICLE  START  PROCESS 


Fig. I  I  STRUCTURE  OF  A  LTR  SYSTEM 


The  range  of  the  identifiers  outside  the  proces¬ 
sing  article  is  as  foll-ws  : 

.  the  only  accessible  data  are  those  declared  in  : 

-  the  task  DATA  ARTICLES 

-  GLOBAL  DATA  ARTICLES 

-  the  parameters 

,  the  only  usable  ones  are  : 

-  the  task  PROCEDURE  ARTICLES 

-  the  GLOBAL  PROCEDURE  ARTICLES 

Inside  tha  article,  the  classic  block  structure 
rules  must  be  respected. 

1.3.  Principle  of  data  allocation 

In  LTl<,  lead  to  different  data  storage  alloca¬ 
tion  the  type  of  article  and  the  data  organization. 

A.  Static  and  permanent 

These  are  the  data,  tables  or  structures  decla¬ 
red  in  a  GLOBAL  DATA  ARTICLE  or  in  a  DATA  ARTI¬ 
CLE.  The  store  spsce  is  reserved  by  the  compiler  and 
life  expectation  is  linked  with  that  of  the  task. 

B.  Automatic  allocation 

These  ure  the  data,  tables  or  structures  locally 
declared  in  the  processing  articles.  The  data  are 
dynamically  initialized  and  data  overluy  takes  place 
according  to  the  block  structure.  Life  expectation 
ib  linked  to  the  internal  block  in  which  they  are 
dec  lared . 

C .  Controlled  allocation 

This  concerns  virtual  data  pointed  by  cbe  user. 
The  data  ure  described  in  a  data  or  processing  arti¬ 
cle  :  links  between  Lhe  description  and  the  data  zone 
to  which  they  apply  is  realized  by  the  execution  of 
pointer  manipulation  instructions  or  by  storage 
al Locat ion . 

1) .  Clia i n  al  locat  ion 

This  concerns  sets  pointed  by  lhe  user  but  whose, 
chaining  is  automatically  ensured  by  the  allocation 
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CODE 

Param  1 

Param  2 

Param  3 
and  others 

PONCTION 

Nocea 

AFF 

operand 

opde  or 
constant 

Affectation 

ADD 

operand 

opde  or 
constant 

opde  or 
constant 

Param  1  -  Param  2  -  Param  3 

(1) 

LSS 

operand 

operand 

Comparison  of  Params  2-3  and  affectation  of  result 
(booleen)  to  Param  1 

(1) 

IF 

address  1 

address  2 

address  3 

(f  1) 

(2) 

FOR 

address  1 

address  2 

(f  2) 

(2) 

WHILE 

address  1 

address  2 

(f  3) 

(2) 

CALL 

operand 

(opde  or 
constant) 

Param  1  i  descriptive  of  procedure 

Param  3  :  parametar  list 

CALLP 

operand 

entry  TD 

Entry  TD  :  address  of  a  TASK  DESCRIPTOR 

Params  2- 3  :  identical 

NEW 

operand 

operand 

opde  or 
constant 

Insertion  of  an  element  in  a  set 

Param  1  :  set 

2  :  insertion  address 

3  :  name  of.  element  to  be  inserted 

(f  1)  IF  <a,><*2><«3>  <exp . bool ,block>  <THEN  block>  <EL$E  block> 

(f  2)  FOR  <a1><a2:>  <incr. block + te»t>  <FOR  block> 

(f  3)  WHILE  <8|><a2>  <*xp . bool . block>  <WHILE  block> 

(1)  Parameter  I  may  be  an  Intermediate  variable  produced  by  the  compiler 

(2)  The  addreeeea  are  N-upla  addraaaea _ 


mechaniam.  The  data  ara  described  in  a  GLOBAL 
DATA  ARTICLE. 

This  preaentation  of  the  language  fixea  the 
cpnetraints  on  defining  memory  management  for  a  LTR 
machine.  We  shall  present  the  solution  chosen  for 
implementing  such  a  system  below, 

2,  INTERMEDIATE  LANGUAGE  (DEL)  FROM  LTR 

An  intermediate  instruction  is  a  byte  chain 
of  varying  length  called  N-uple. i 

A  N-uple  may  be  an  expression  (OPERATOR, (OPE¬ 
RAND)*)  in  which  the  number  of  operands  is  fixed 
only  by  the  LTR  instruction  specifications. 

Definition  of  the  operator  codas  is  fixed  by  the 
LTR  instructions  ;  each  instruction  has  been  regrou¬ 
ped  in  the  form  of  an  N-uple,  at  the  same  time  con¬ 
serving  all  the  semantic  contained  in  the  source 
instruction, 

The  upper  table  gives  some  examples  of  N-uples. 

In  the  operand  pert,  we  may  find  either  ,i  cons¬ 
tant,  an  N-uple  address,  or  a  data  descriptor  ad¬ 
dress.  The  operand  is  prafixed  by  a  directive  which 
prescribes  the  descriptor  type  s 


CONV  it-  cv  NUMBER 

MOD  !!»  ct, (CTSI/OPDE.CTSI/OPDE) /pt,H. S.H. ,CONV 

INDEX  it-  ix , H . S . H. / indexi , CTKi 

CTSI  ii-  ct.CTEi 

H.S.H.  it-  address  of  descriptor 

CTEi  ii-  insaediate  constant 

This  intermediata  form  is  very  close  to  the 
source  language.  The  semantic  information  contained 
in  an  LTR  instruction  has  been  coded  in  the  inter¬ 
mediate  instruction  so  as  to  facilitate  interpreta¬ 
tion  t  the  interpreter  will  enalyse  instruction  pre¬ 
fixing  by  operational  code,  execution  and  control 
addresses  and  operand  directives. 

All  non-constant  variables  are  addressed  through 
a  descriptor  which  contains  the  information  set  cha¬ 
racterising  the  data  used  by  the  interpreter. 

The  basic  descriptor  is  a  10  bytesword  which  maj> 
have  extensions  for  complex  operands  (table,  struc- I 
ture,  process  descriptions).  In  the  standardised 


NAME 

INDIC 

BASE 

DEPL. 

TYPE 

STRUCT 

SIZE 

SCALE 

EXT 

(ch) 

1 corchain'  for 

bit  chains 

(ix) 

table  index 

(rf) 

reference  of  a 

structure  field 

(pt) 

pointer  to  a  aat 

(ct) 

constant 

(op) 

oparand 

(cv) 

conversion 

The  DEL-LTR  may  be  summarised  schematically  as 
follows  : 

IMI  ii-  (N-uples)* 

N-uple  it-  (OPCODE, (OPERAND)*) 

.  OPERAND  it-  (CTSI) (OPDE), (CONV /MOD, (CONV)) 

OPDE  it-  op, H.S.H. , (INDEX) 
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name  of  the  variable, this  information  allows 
the  editing  of  the  state  of  the  variables  du¬ 
ring,  the  debugging  phase 

INDIC  ;  data  implantation  type  t  global, local, parameters 
BASE-DEPLACEMENT  t  dete  implantation  address 
TYPE t  Integar,  Real,  Fixad,  Index,  Character  string, 
logic,  boolean,  quality,  static  reference, vir¬ 
tual  data  reference,  set  element  reference 
STRUCT  i  *rrsy,atructure,etructure  array, virtual  date, 
set 

SIZE  i  space  occupied  by  the  data 
SCALE  i  normalisation  factor 

EXT  ;  pointer  to  an  extension  descriptor  i 
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3.  LTR  PROCESSOR  STRUCTURE 

"T^ie  LTR  processor  structure  follows  from  the  me¬ 
thodology  described  above, 

The  processor  is  composed  of  two  pipe-line  units, 
one  for  macro-interpretation  processing  (MAI),  the 
second  for  micro-interpretation  processing  (Mil) 

■  Cfltt.2). _ 


Fig. 2  :  LTR  processor  block-diagram 


The  central  memory  is  divided  into  three  physi¬ 
cally  separate  memories  : 

-  the  N-uple  memory  contains  the  intermediate  code 
and  is  accessible  to. the  MAI  processor  only 

-  the  descriptor  memory  contains  the  data  descrip¬ 
tors,  systems  data  and  processes  :  it  is  accessi¬ 
ble  to  the  Mil  processor  only 

-  the  data  memory  contains  the  data  described  in 
the  source  program, 

A  N-uples  is  interpreted  in  two  phases  ; 

-  tiie  first,  in  the  macro-interpreter  (MAI)  .manages 
the  IML,  execution  control  j  it  divides  a  N-uple 
into  simple  instructions  which  it  sends  to  the 
micro-interpreter  (Mil) 

-  the  second  phase,  therefore,  takes  place  in  the 
micro-interpreter  (Mil)  which  merely  executes, 
sequentially,  the  actions  send  by  the  MAI  :  search 
for  operand  descriptor,  conversion  of  a  number, 
arithmetic  operations  ...  j  these  actions  corres¬ 
pond  to  a  sat  of  microprograms  contained  in  the 
Mil  control  store. 

The  connection  between  the  two  units  is  reali¬ 
zed  through  the  intermediary  of  two  hardware  queues: 
a  parameter  queue  and  a  action  number  queue.  Moreo¬ 
ver,  state  variables  and  calculation  results  may 
transit  between  the  two  units. 

The  two  queues  allow  a  synchronization  of  tin- 
two  processors  and  ensure  pipe-line  nv.agement . 


-  the  N-uples  memory  has  read  access  over  4  bytes  ; 
the  descriptor  memory  has  a  double  read/write  ac¬ 
cess  also  over  10  bytes  ;  the  first  contains  the 
descriptor  and  the  second  the  context  of  the  micrc- 
machine 

-  the  data  memory  has  a  read/write  access  over  two  • 
bytes,  the  size  of  the  data  path  being  16  bits. 

The  scheduling  algorithm  occurs  on  the  Micro 
Interpreter  which  sends  a  task  number  to  the  MAI  ; 
the  context  set  is  described  in  the  CONTEXT  section. 

3.1.  Macro  Interpreter  Structure  (fig. 3) 

The  macro-interpreter  supports  the  formal  and 
organic  processings  attached  to  the  system  execution 
control.  Formal  processing  amounts  to  management  of 
the  N-uple  ordinal  counter  (management  of  the  recur- 
sivity  of  IML  instruction)  and  organic  processing 
concerns  procedure  context  switching.  A  context  swit¬ 
ching  may  occur  on  two  types  of  event  : 

-  switching  on  interruption 

-  switching  on  process  call 

In  the  first  case,  the  interrupted  process  con¬ 
texts  may  be  managed  in  stacks  ;  interruption  mecha¬ 
nism  can  he  implemented  according  to  a  hierarchic 
algorithm. 

When  the  process  attached  to  the  interruption  of 
level  takes  place,  it  can  be  interrupted  only  by  an 
interruption  of  level  j  (j  >  i)  j  control  will  be  re¬ 
turned,  after  processing  of  level  j,  to  the  level  i 
process  or  to  a  process  with  a  higher  priority. 

This  mechanism  may  be  implanted  with  the  help  of 
just  one  stack,  the  summit  context  being  the  active 
context. 

On  the  other  hand,  for  process  activated  by  an 
open  call,  it  is  possible  to  avoid  returning  to  the 
calling  process.  A  stack  must,  therefore,  be  alloca¬ 
ted  to  this  process  and,  during  switching,  the  num¬ 
ber  of  the  stack  containing  the  caller's  context  must' 
be  saved  .  The  task  is,  then,  executed  in  its 
own  stack  space,  For  all  closed  calls,  the  context 
may  be  safeguarded  in  the  active  stack  (mechanism 
identical  to  that  of  activations  on  interruption)and 
for  pseudo-open  calls  (an  open  which  return  control 
to  the  calling  process)  two  stack  spacuH  are  suffi¬ 
cient  , 

We  allow  for  16  stack  spaces  (13+  interruption) 
which  permit  an  interleaving  of  15  open  calls  without 
return  to  the  calling  process.  The  size  of  each  spuce 
is  assessed  at  I  Kwords,  This  space  and  the  manage¬ 
ment  mechanism  are  represented  by  stack  II.  The  micro- 
interpreter  context  will  be  switched  at  the  top  of 
the  active  stack,  the  active  stack  being  found  in 
the  process  descriptor. 

Ordinal  counter  management  is  ensured  by  a  reen¬ 
trant  microprogrammed  interpreter  whose  essential 
functions  are  : 

-  access  to  the  source  text 

-  analysis  of  the  instruction  operation  code 

-  to  break  up  an  N-uple  into  elementary  ACTION 
functions . 


The  division  of  the  stores  in  function  of  the 
information  they  contain  allows  a  real  parallelism 
between  the  different  accesses  and  also  pnriic.uln- 
riaation  of  each  access  s 
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I  Fig. 4  :  THE  MICRO-INTERPRETER 


Fig.  5  :  THE  SCHEDULING  PRINCIPLE 


Example  :  Interpretation  of  an  IF  instruction, 

Vfhen  the  operation  code  is  decoded,  the  inter- 
pretation  consists  in  t 

-  stacking  the  three  addresses  <a]><a2><a3>  in 
stack  II  of  the  active  procedure 

-  loading  <a  »  onto  the  CPT  register 

.-  calling  a  rule  <boolean  exprossion>  ,  (I) 

The  end  of  the  <Expbool  block>  is  supplied  by 
the  comparator  which  determines  the  egality  between 
.the  CPT  register  and  the  instruction  counter  (IC) . 

Depending  on  the  value  of  the  boolean  transmit¬ 
ted  by  the  Mil,  address  a  2  loaded  onto  the  IC 
[and  address  aj  is  loaded  onto  the  CPT  (value  0}  or 
[address  a 2  is  loaded  onto  the  CPT  and  the  IC  regis¬ 
ter  is  not  affected  (value  I),  At  the  end  of 
tblook  THEN>  ,  address  83  is  loaded  onto  IC, 

(i)  This  call  is  carried  out  by  stacking.  AR  onto 
Stack  I  and  the  return  of  the  rule  provokes  a  pop 
operation.  This  mechanism  allows  an  interpretation 
of  the  language  in  accordance  with  a  method  of  des¬ 
cending  enelyeie. 


3.2.  Micro-Interpreter  Structure  (fig. 4) 

The  micro-interpreter  is  the  CPU  of  conventional 
computers.  It  is  composed  of  a  control  store  contei- 
ning  the  set  of  interpretation  microprograms  and  a 
data  path  formed  by  an  arithmetic  logic  unit  (AMD 
2903)  and  a  Bit  Pattern  Manipulator  (BPM)  capable 
of  performing  logic  operations  on  bit  sets  (permu¬ 
tation  of  byteg,  extraction  and  scaling  of  bit  fields, 
concatenation) ,  The  Mil  manages  access  to  descriptor 
end  date  stores  and  executes  the  part  of  organic 
processing  relative  to  the  management  of  the  data 
space  of  a  procedure. 

The  access  register  of  the  descriptor  store  is, 
in  fact,  a  local  memory  composed  of  three  blocks  of 
ten  byte*.  This  memory  constitute*  an  extension  of 
the  internal  registers  to  microprocessor  AMD  2903. 

The  first  block  contains  a  procedure  descriptor 
or  a  data  descriptor,  the  second  may  contain  a  data 
descriptor,  and  the  third  contains  the  Mil  context. 
V*  shell  see  in  the  CONTEXT  section  that  this  solu¬ 
tion  allow*  tn  optimisation  of  context  switching. 
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4.  STORE  MANAGEMENT 


4.1.  Oaca  a tore .management 


Logically,  this  store  should  be  managed  in  such 
a  way  that  the  implantation  of  data  and  way  of  acce¬ 
ding  to  it  should  be  directly  deducible  from  the  LTR 
system  structure  and  from  the  constraints  quoted  in 

(I). 

The  structuration  of  the  program  into  ARTICLES 
suggests  an  addressing  in  relation  to  different 
bases.  This  technique  allows,  moreover,  the  defini¬ 
tion  of  a  protection  for  each  segment,  an  important 
factor  in  the  real-time  field. 


It  will,  however,  be  necessary  to  allow  for  di¬ 
rect  addressing  in  particular  for  the  passage  of  pa¬ 
rameters  by  address. 


Since  the  LTR  processor  takes  the  recursion  and 
reentry  of  the  procedures  and  processes  into  account, 
it  leads  us  to  allocate  a  stack  for  each  process 
where  the  contexts  of  each  procedure  call  will  be 
conserved  and  local  data  of  the  called  procedure 
will  be  created. 

It  can  be  seen  that  the  basic  addressing  is  not 
sufficient  to  manage  the  memory  efficiently,  There 
is  a  possibility  of  a  proliferation  of  sonas  of  dy¬ 
namically  created  data.  It  follows  that  it  will  be 
difficult  to  recover  the  free  space  and  for  this 
reason  we  have  added  to  the  addressing  system  a  sys- 
igeai  of  storage  allocation  by  paging  and  "topograr 
jihic"  store. 

However  we  have  also  tried  to  adapt  the  addres¬ 
sing  mode  to  the  type  of  accessed  data  by  addresr 
sing  directly  the  global  data,  whose  life  expectan¬ 
cy  is  that  of  the  system,  and  reserving  topographic 
Addressing  for  data  with  a  shorter  life.  The  cha¬ 
racteristics  of  these  different  xonas  are  determi¬ 
ned  by  the  requirements  of  the  LTR  system  to  be 
executed. 


To  sum  up,  we  have  allowed  for  the  following 
addressing  modes,  which  appear  in  the  descriptions 
of  the  system  variables  : 

-  general  direct  addressing,  for  the  use  of  data 
declared  in  GLOBAL  DATA  ARTICLE 

-  direct  addressing  for  the  use  of  the  process  or 
procedure  call  parameters  and  also  the  sets 

-  topographic  addressing,  localised  in  the  process, 
for  the  use  of  data  declared  in  DATA  ARTICLE 

-  topographic  addressing,  localised  in  the  proce¬ 
dure,  which  interests  the  process  stack,  for  the 
use  of  date  declared  in  a  PROCEDURE  ARTICLE,  glo¬ 
bal  or  not, 


Different  address  calculations 

Let  ffOPO  he  the  function  calculating  the  real 
address  of  a  variable  from  its  virtual  address. This 


association  function  consists  in  replacing  the  vir¬ 
tual  page  number  by  the  real  page  number.  This  as¬ 
sociation  is  realised  during  storage  allocation,  by 
the  operating  system  and  is  materialized  by  a 
"topographic"  store.  The  list  of  pages  allocated  to 
a  process  is  part  of  its  context. 


An  address  of  this  type  is  always  contained  in] 
a  pointer  :  , 

-  Calculation  of  a  process .local  address  (PS^)  j 

a-fyoPO  ((base  L)  +  deplacement) 

-  Calculation  of  a  procedure  local  address  (PDA)  ' 


Descriptor  addressin 


The  data  of  a  program  are  referencod  in  the  co4« 
through  the  intermediary  of  a  descriptor.  It  is  iml 
planted  in  a  memory  10  bytes  wide  and  addressable  ! 
on  64  K.  However,  in  order  to  simplify  program  de¬ 
bugging,  the  LTR  source  text  may  be  compiled  by  mo¬ 
dules  (an  executable  system  may  be  composed  of  se¬ 
veral  modules) .  The  solution  classically  adopted  in 
machine  languages  to  assemble  the  different  modules 
consists  in  making  the  process  linking  dynamically. 
We  have  not  retained  this  solution  as  it  has  proved 


Therefore  : 

-  Calculation  of  a  general  direct  address  (GDA) 

a  -  (base  G)  +  dcplacement 

-  Calculation  of  a  reference  direct  address  (DA) 

a  -  deplacement 


to  be  too  time  costly  in  execution  and  considerably 
increases  the  system  overhead  time.  We  have,  there¬ 
fore,  chosen,  to  address  the  descriptors  by  (base, 
deplacement).  Therefore,  at  a  given  moment  we  have 
three  bases  i 

-  Base  of  Global  data  descriptors 

-  Base  of  Data  descriptors 

-  Base  of  local  data  descriptors  for  active  procedure. 
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The  values  of  these  bases  are  determined  when 
loading  the  blocks  they  reference.  It  is  to  be  noted 
that  these  bases  are  an  integral  part  of  the  process 
context . 


4.3.  Implementation  of  data  systems 

We  shall  now  examine  the  solutions  adopted  for 
the  implementation  of  the  system  processors, notably: 
the  scheduler,  management  of  events  and  semaphores 
and  interrupts. 

4.3.1.  Processor  implantation 

The  processors  monitors  are  microprograraned  and 
run  on  the  micromachine.  The  data  manipulated  by 
these  programs  are  implanted  in  the  form  of  des¬ 
criptors,  for  protection  purposes.  In  effect,  only 
the  microprograms  are  authorized  to  write  in  the 
descriptor  store  during  the  execution  of  a  system. 
These  processors  manipulate  descriptor  strings. 


4.3.2.  lmpiantation  of  scheduler  data 
The  scheduler  manipulates  process  descriptors. 
These  descriptors  have  the  following  structure. 


NAME 

INDIES 

LAV 

LAR 

EXT  BASE 

CODE  BASK 

DESC  DATA 

BASE 

PROG 

SPACE 

PR01NIT 

STACK  NUMBER 

NAME  :  pointer  towards  process  identification 

INDIC  :  process  current  state  word 

LAV- 1,  All  :  stringing  of  process  in  queues 

EXT  :  pointer  towards  an  extension 

BASE  CODE  :  address  of  code  implantation 

BASE  DESC  DATA  !  address  of  data  descriptors 

BASE  PROD  SPACE  :  address  of  data 

PR01NLT  :  pointer  towards  procedure  status  descriptor 

The  scheduler  manipulates  only  the  CU  proces¬ 
sor's  queue  (ready  processes).  In  effect,  the  other 
lists  are  manipulated  by  the  other  system  proces¬ 
sors  which  will  return  control  to  the  scheduler  at 
the  end  of  their  execution.  The  head  of  this  list 
is  represented  by  a  descriptor  implanted  in  a  fixed 


address  with 

the  form 

FIRST  LAST 

NB  LIST 

NB  CREATED  j 

NB  ACTIVE 

FIRST, LAST  : 
NH  LIST  : 
NB  CREATED  1 

reference  points  on  the  list 
number  of  processes  in  the  list 
number  of  processes  created 

NB  ACTIVE  :  number  of  active  processes  at  present. 


4.3.3.  Event  and  semaphore  management 
We  first  decided  not  to  implant  event  expression 
resolution.  Our  choice  was  motivated  by  the  com¬ 
plexity  of  such  a  resolution  and  the  multiplication 
of  hardware  it  would  cause.  We  have,  therefore, 
grouped  the  processing  of  events  and  semaphores. 

The  physionoiuy  of  the  descriptors  manipulated  is  as 
follows  : 


NAME 


VALUE 


TYPE  FIRST  LAST 


NAME  :  pointer  towards  the  semaphore  or  event 
identifier 

VALUE  :  value  o£  an  instant  of  the  variable 

TYPE  :  eveut/semaphore 

FIRST, LAST  :  processor  queue  reference 


4.3.4.  interruption  management 
The  interruptions  are  materialized  by  a  des¬ 
criptor  witli  the  form  : 


1 NTERUPT 


STATE 


ATTACH  PROCESS  ADDRESS 


The  IT  descriptors  are  implanted  in  addresses 
equal  to  running  level  (IT  N*  i  -*•  descriptor  of 
address  i) .  When  an  IT  is  enable,  the  IT  processor 
inserts  the  process  at  the  head  of  the  queue.  The 
scheduler  takeB  control  and,  if  necessary,  activa¬ 
tes. 

This  processing  concerns  IT  directly  connected 
with  a  task. 


5.  INTERRUPT! BILITY 

The  definition  of  interruptibility  at  a  "logic" 
level,  that  is,  at  the  level  of  the  intermediate 
language  and  the  macro-interpreter,  is  very  deli¬ 
cate,  or,  even,  impossible,  given  the  contextual 
interpretation  mode  we  have  chosen.  An  "instruction',' 
or  execution  unit,  at  this  level,  is,  in  effect, 
something  of  variable  length,  and  may  even  be  the 
program  itself. 

The  concept  of  point  of  interruptibility  must, 
therefore,  be  more  closely  defined,  even  if  the 
macro-interpreter  level  presents  the  interest  of 
reducing  context  volume  to  a  minimum  when  enable 
the  interrupt' 

The  division  of  an  N-uple  by  the  Macro-Inter¬ 
preter  into  ACTIONS  permits  the  interruptible  points 
to  be  fixed  at  the  beginning  of  each  ACTION.  ThiB 
choice  establishes  a  compromise  between  the  volume 
of  information  to  be  saved  and  the  time  needed 
to  set  up  this  safeguard,  in  effect  : 

-  The  fastest  possible  takeover  of  the  interrupts 
will  have  for  effect  the  switching  of  a  larger 
number  of  data,  therefore  an  effective  time  such 
that  this  politic  is  in  danger  of  losing  its 
interest 

-  A  takeover  defered  until  certain  key  moments  in 
the  execution  of  a  program  will  entail  t he  mani¬ 
pulation  of  a  smaller  amount  of  data  and  may, 
therefore,  be  more  efficient  than  immediate  pro¬ 
cessing. 

Moreover,  at  the  beginning  of  ACTION,  MAI  con¬ 
text  is  at  a  minimum.  However,  to  justifie  this 
choice,  the  execution  time  of  an  action  must  remain 
compatible  with  the  requirements  of  interrupt 
processing. 

6.  CONTEXT 

Given  the  machine  structure  we  have  described, 
this  context  will  be  larger  than  that  found  on  a 
conventional  machine.  It  is,  moreover,  spread  over 
several  functional  units  and,  thus,  may  be  divided 
into  three  parts  : 

-  task  characterisation  context 

-  macro-interpreter  context 

-  micro-interpreter  context 

6.1,  Task  char acterisation 

This  is  the  part  of  the  context  which  is  clo¬ 
sest  to  the  information  found  on  a  classic  machine. 
It  defines,  both  the  identity  of  the  process  and 
its  work  space  for  anything  concerning  the  Jata 
man i pul  a  ted . 

Definition  of  process  identity  includes  the 
following  information  : 

NAME  :  pointer  to  the  name  of  the  process 
AIH'.ODE  :  process  start  address 

NIT  :  Lied  number  ot  interrupt 
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This  information  will  be  contained  in  a  speci¬ 
fic  location  in  the  descriptor  memory. 

The  definition  ol  process  work  space  includes 
the  following  informations: 

ADDKSC  :  description  space  base 
STACK  :  number  of  the  execution  stacks  in  the 
Macro- Interpreter 

BASE  1  (G)  :  process  global  data  base 

BASE  2  (.L)  :  process  local  data  base 

BASE  3  (2)  :  local  data  base  for  running  procedure 

BASE  4  address  of  page  table  for  Lite  process. 

The  type  ol  topographic  implantation  chosen 
tsee  abov.e)  calls  for  the  constitution  of  corres- 
pondance  tables,  virtual  pages  -»  real  pages,  proper 
to  each  process.  During  execution  of  a  process  this 
table  is  loaded  in  a  specialized  memory  and  must 
exist  in  memory  so  that  it  can  be  reestablished 
after  interruption  followed  by  context  switching. 

6.2 .  Macro-Interpreter  context 

The  execution  of  a  process  brings  about  an 
evolution  of  the  information  contained  in  the 
macro-interpreter,  characterizing  the  logical  evo¬ 
lution  of  interpretation. 

This  information  also,  may  be  put  in  three 
parts  : 

-  Program  context 

,  1C  :  instruction  counter  of  the  program  in  IML 
.  OPT  :  address  of  end  of  block  under  examination 
.  STACK  11  and  TOP  2  ;  address  stack  for  the  end 
ot  the  included  block  and  its  pointer 

-  Interpretation  cnni-pxt 

TAR  :  address  register  on  interpretation  program 
.  .  STACK  2  :  return  address  stack  at  the  end  of 

the  decoding  submicroprogram 

-  State.  v£  .cpaBunicatiQn  with  ths_migronachi,De 
.  Generated  actions  queue  and  its  pointers 

.  Queue  of  parameters  to  be  transmitted  and  its 
pointers. 

6.3.  Microwecitl.lt.  context 

The  v.  we  of  significant  context  in  the  micro¬ 
machine  hit.  itn  reduced  considerably  by  the  fact 
that  the  interrupts  are  euaUled  between  two  ac¬ 
tions,  as  we  have  said  above. 

The  information  to  be  saved  are  the  five  re¬ 
gisters  making  up  the  external  register  of  the 
CD  29C3.  These  registers  are  used  to  transmit  the 
panur.aters  between  the  various  actions.  It  is  to 
be  noted  that  as  this  extension  is  in  direct  access 
with  the  descriptor  memory,  its  content  is  saved  in 
a  single  memory  cycle. 

This  information  will,  therefore,  be  saved  m 
the  space  descriptor  of  the  interrupted  process. 

CONCLUSION 

The  high  level  computer  architectures  previous¬ 
ly  studied  or  realized  concerned  monotask  langua¬ 
ges.  This  study  shows  the  principal  problems  met 
in  the  implementation  of  a  multi-task  real  time 
language . 

Interpretation  processing  has  been  divided 
into  three  classes  : 

-  organic  processing  associated  with  the  management 
of  a  multi-task  systmo 

-  formal  processing  associated  with  tfc*.  control  of 
one  task 


-  effective  processing  associated  with  the  execution 
of  each  instructions  of  one  procedure. 

The  hardware  structure  has  been  designed  to  sup¬ 
port  efficiently  these  three  kinds  of  processing. 

The  realization  of  a  prototype  able  to  support 
the  LTR  language  should  allow  the  validation  of 
titese  concepts. 
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v  ■  Abstract.  We  introduce  an  architecture  which  performs 
many  ol  the  optimizations  commonly  seen  In  sophisticated 
compilers  f6r  high-level  languages,  Including  redundant 
'  expression  elimination  and  the  movement  of  Invariant 
expressions  out  of  loops.  The  instruction  set  of  this 
'  machine  allows  simple  compilers  to  produce  a  graph- 
•  structured  object  code  which  Is  both  compact  and 
efficient.  The  architecture  features  a  cache  which  records 
■J  the  values  and  dependencies  of  HLL  expressions  In  order 
to  avoid  later  recompulatlons  and  memory  references. 
Preliminary  experimental  results  indicate  a  speedup 
npproaching  a  factor  of  two  over  a  pure  stack  architecture 
on  some  programs. 


1.  Introduction 

The  arguments  in  favor  of  closing  the  "semantic  gap" 
between  source  program  and  oblect  program  are  well  known 
'•  by  participants  of  this  conference.  Myers  [1  ]  characterizes  the 
job  of  the  computer  architect  as  determining  the  proper 
division  of  total  system  functionality  between  software, 
firmware,  and  hardware.  Two  extremes  ol  this  division  are 
possible.  At  one  extreme  we  have  traditional  architectures 
which  tend  to  leave  too  much  to  tho  software  and  are  ill-suited 
to  the  software  they  execute.  Complex  operating  systems  are 
necessary  to  make  them  useful;  complex  compilers  are 
necessury  to  make  high-level  languages  (Fills)  execute 
efficiently.  At  the  other  extreme  we  have  architectures  which 
attempt  to  execute  high-level  languages  directly.  These 
architectures  are  often  Inefficient  ihemselves;  program 
representations  appropriate  for  programmers  are  not  always 
appropriate  tor  computers.  It  is  likely  that  better  cost- 
performance  can  be  achieved  by  an  architecture  which  falls 
somewhere  between  these  extremes.  Our  architecture  is  one 
of  many  such;  it  is  aimed  at  reducing  or  eliminating  the  need 
(and  hence  the  costs)  of  optimizing  compilers  by  performing 
important  optimizations  in  hardware.  It  does  not  directly 
address  other  dimensions  of  the  pioblem,  such  as  the 
complexity  ol  operating  systems. 

The  total  cost  of  optimizing  compilers  Is  great.  Their 
construction  is  a  formidable  software  engineering  task,  The 
code  they  produce  is  almost  aiwavs  obscure,  occasionally 
worse  lhan  no  optimization,  and  sometimes  just  plain  wrong. 
They  also  execute  more  slowly,  and  hence  exact  a  price  on 
each  compilation.  Research  Is  underway  in  several  places 
aimed  at  reducing  this  cost  through  the  automatic,  or  semi¬ 
automatic,  generation  of  such  compilers  |2].  Our  approach  to 
this  problem  is  dillerent;  we  are  trying  to  raise  the 
hardware/aoftware  interface  above  'he  level  nf  the  compiler's 
optimization  phase,  thus  reducing  the  compiler's  task  to 
(mainly)  lexical  analysis  and  paising.  Efficient  algorithms  for 
these  phases  are  known,  and  the  automatic  construction  of 
‘such  compilers  would  be  within  our  grasp, 

Our  architecture  is  able  to  perform  two  common  and 
important  optimizations,  redundant  exoression  elimination  and 
;  a  type  of  code  motion  typilied  by  the  movement  of  invariant 


expressions  out  of  loops.  These  optimizations  traditionally 
require  sophisticated  flow  analysis  during  compilation,  so  their 
elimination  from  compilers  should  be  beneficial.  Our  research 
is  aimed  at  determining  how  big  an  impact  this  architecture 
can  have  on  the  total  cost-performance  ol  a  compiler- 
architecture  pair. 

In  this  paper  we  will  introduce  the  architecture  and  argue 
its  advantages  Informally  and  by  example.  Other  work  is  under 
way  to  determine  the  architecture's  quantitative  benelits  over  a 
range  ol  real  programs.  Because  we  are  interested  in  basic 
feasibility,  we  rleler  the  specilicution  ol  many  details  which 
wouki  be  necessary  belore  the  architecture  could  be  realized. 
In  particular,  we  are  not  specifying  how  lo  implement  the 
architecture,  nor  are  we  specifying  the  Instruction  set  beyond 
what  we  absolutely  need.  So  as  not  lo  be  overly  distracted  by 
language  issues,  we  have  chosen  FORTRAN  as  our  high-level 
language.  We  believe  that  the  necessary  extensions  for  other 
languages  would  be  no  more  difficult  on  our  architecture  than 
on  others,  and  thereloro  they  are  Irrelevant  to  the  current 
goals  of  the  research. 

2,  Basic  Concepts 

To  briefly  outline  the  thrust  of  the  architecture,  consider 
the  FORTRAN  statement 

X  =  (A  +  B)*C  +  (A  +  B) 


which  has  this  parse  tree; 


/\ 

/Ox 

»  n  a  I 


B 

/\ 

A  B 

Suppose  w  j  had  an  instruction  set  which  closely  mlmlced  this 
parse  '  ae  representation,  one  instruction  per  no.  Each 
inslructiun  might  be  a  triple 


(OPCOOE,  LEFT  PARI,  RIGHT  PART) 

where  left  pari  and  right  part  would  be  addresses  of 
instructions:  which  calculate  the  operands.  The  execution  of 
an  instruction  would  consist  of  recursively  evaluating  the  left- 
and  right-  parts  ol  the  instruction,  followed  by  the  application 
of  the  indicated  operation.  This  architecture  could  be 
implemented  using  two  stacks;  one  to  hold  intermediate 
computations  and  one  to  hold  partially-evaluated  instructions 
during  the  post-order  traversal  ol  the  parse  tree.  The  order  ol 
instruclions  in  memory  would  be  irrelevant  in  this  instruction 
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set-  -  the  control  flow  Is  specified  explicitly.  The  translation  of 

the  above  statement  would  be 

» . X . PI 

PI;  +  .P2.P3 
P2:  • , P4 ,C 

P3 ;  + ,  A  ,8 

P4 :  +  ,A,B 

This  instruction  set  is  obviously  very  inefficient,  but  it  can 
illustrate  two  points.  First,  because  the  Instructions  labeled  P3 
and  P4  are  identical,  there  is  no  reason  to  duplicate  them;  we 
can  eliminate  P4  and  change  P2  to 

P2:  *.P3,C. 

The  subexpressions'  giving  rise  To  P2  and  P4  are  called,  in 
the  parlance  of  compilers,  formally  Identical  or  congruent, 

This  simply  means  that  they  are  identical  In  lorm-not 
necessarily  that  they  have  the  same  value.  It  is  both  simple 
and  efficient  to  detect  formal  identity  during  parsing,  and  doing 
so  at  compile  time  allows  us  to  represent  programs  more 
apace-efficiently  In  our  architecture.  By  contrast,  detecting 
common  subexpressions,  l,e„  formally  identical  expressions 
that  also  are  guaranteed  to  have  the  same  value  at  execution 
-time,  is  not  as  simple  or  efficient.  Our  architecture  will  not 
require  the  compiler  to  do  this. 

Notice  that  even  though  the  expression  "A  +  B"  Is 
;  represented  only  once  in  the  object  program  (using  the 
aforementioned  compaction),  it  is  actually  evaluated  twice  In 
,the  Implied  traversal  of  the  parse  tree.  The  structure  of  the 
object  code  givos  us  the  possibility  of  avoiding  this 
recomputation,  Suppose  that  alter  completing  the  evaluation 
of  P3  (while  computing  the  left  part  of  PI)  we  saved  the 
"value"  of  this  instruction  In  a  cache,  labeled  by  the  address 
P3.  If  we  checked  that  cache  before  evaluating  each 
instruction  operand,  we  could  retrieve  the  value  of  P3  when 
computing  the  right -part  of  PI  without  actually  recomputing 
il.  Suitable  care  would  have  to  be  taken  to  record  dependency 
information  In  the  cache  so  that  we  could  remove  the  value, 
should  either  A  or  B  change  in  the  future. 

Our  architecture  provides  such  a  cache,  which  is  the  major 
source  of  execution-time  efficiency.  The  effect  of  using  this 
cache  corresponds  closely  to  the  elimination  of  redundant 
expressions  by  optimizing  compilers.  In  fact,  this  technique 
may  be  superior,  because  it  can  eliminate  expressions  which 
are  redundant  under  the  particular  execution  history  of  the 
program.  Consider,  for  Instance,  the  following  FORTRAN 
statements: 

V  ■  A+B 

IF  (Y  , LT .  0)  A  ■  A+l 
X  ■  A+B 

Because  the  two  occurences  of  "A  +  B"  are  formally  Identical, 
they  can  be  computed  by  a  single  instruction  which  Is 
referenced  in  two  assignment  statements.  It  can  be  seen  that 
the  value  of  the  expression  A  +  B,  computed  In  the  first 
statement,  can  remain  In  the  cache  unless  the  assignment  to  A 
actually  takes  place  (invalidating  A  +  B),  The  same  mechanism 
serves  to  move  Invariant  expressions  out  ol  loops,  since  any 
expression  which  does  not  deperid  on  a  value  changed  in  the 
loop  will  remain  In  the  cache. 

This  simple  example  illustrates  our  architectural  goal:  to 
provide  an  instruction  set  which  preserves  the  structure  of  the 
parse  tree  in  a  way  that  permits  both  space-efficient 
representation  (by  having  only  one  copy  of  the  code  for 
formally  identical  expressions)  and  time-efficient  execution  (by 
detecting  and  avoiding  the  re-evaluation  ol  expressions  whose 
value  has  not  changed). 

3iThe  Architecture 

We  now  introduce  the  architecture  and  instruction  set 
currently  being  used  In  our  research,  We  would  like  to 
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emphasize  that  this  version  ol  the  architecture  Is  a  research] 
vehicle-one  intended  (only)  to  test  the  feasibility  of  the  ideas  j 
and  their  Impact  on  performance.  A  realistic  Implementation 
would  need  to  address  other  issues  and  would  require  careful 
tuning  and  elaboration  of  the  instruction  set. 


There  are  four  Important  parts  of  the  machine,  as  indicated  in : 
Figure  1: 


Memory  A  linear  vector  of  fixed-size  words,  indexed  i 

by  address.  1 

E  valuation  Stack  A  LIFO  stack  of  words,  used  to  hold 
Intermediate  values  during  computation, 
much  the  same  as  in  other  stack-orfenled 
machines. 

Control  Stack  A  UFO  stack  of  control  Information,  used  to 
control  the  recursive  descent  through  the 
parse  tree  graph. 

Value  Cache  An  associative  memory  used  to  save  the 
values  of  expressions. 


The  Control  Stack  and  the  Value  Cache  will  be  explained  in 
more  detail  later. 
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Figure  3;  Memory  wonMornwt  I 

Every  word  In  memory  is  a  one-operand  instruction,! 
formatted  as  a  [tag,  value)  pair  (Figure  2).  Even  words  uauiNy  i 
thought  of  as  dRta  are,  in  this  machine,  instructions.  The  tao  1 
field  is  further  divided  into  a  number  of  subfiekJs,  named  R,  x, 
i,  and  or.  or  is  the  operation  code  (e.g.  ADD),  and  n,  x,  and  i 
are  single-bit  fields  denoting  ftnturn,  Index,  and  Indirect. 
(These  will  be  described  later.)  The  actual  bitwise  packing  of 
these  fields  Into  a  word  is  not  too  important,  but  lor 
concreteness,  we  think  ol  tag  as  being  B  bits  and  value  as 
being  (say)  24  bits,  This  would  give  us  a  5-bit  operation  code 
and  leave  24  bits  tor  data  or  an  address. 


3. 1  Instruction  Classes 

1 

The  instructions  are  divided  into  three  classes  according  to1 
how  their  operands  are  interpreted.  The  three  classes  are[ 
data  Instructions,  address-operand  Instructions,  and  value-' 
operand  Instructions. 


Data  Instructions.  The  INT,  REAL,  and  ADOR  instructional 
correspond  to  the  three  data  types  recognized  by  this  simple  i 
version  ol  the  architecture.  Executing  any  of  these 
instructions  causes  them  to  push  themselves  (value  and  tao) 
onto  the  Evaluation  Stack,  setting  n«1  and  x»i»0,  The 
contents  of  the  vaiue  field  in  data  Instructions  is  the  setuat 
data  (l.e.,  in  INT  Instructions,  value  Is  the  Integer  datum,  ini 
RE  At  it  is  the  floating-point  representation,  and  In  ADDR; 
instructions  it  is  an  address). 

The  data  instructions  are  quite  like  "tagged"  data  tn  other  j 
HLL  architectures,  in  particular,  we  will  assume  automatic) 


type  conversion  throughout- -there  will  not  be  separate 
instructions  for  floating-point  addition  and  integer  addition,  for 
instance. 

if  X  is  a  variable  of  type  REAL  with  value  43.S,  the  name  X 
will  be  bound  to  the  address  of  a  word  containing  (the 
instruction) 

REAL  43. 5, 

The  reason  wo  moke  data  words  executable  will  become  clear 
when  the  operand-fetching  mechanism  is  examined  later. 

Address  operand  .instructions  These  instructions  Include 
I  MCI  (incroment-by-ono),  INC  (general  increment),  StO  (store), 
and  the  twelve  conditional-jump  Instructions.  In  each  case,  ttie 
vai  ui.  field  is  Interpreted  as  an  address,  and  this  address  is  the 
instruction  operand.  The  semantics  of  the  instructions  are  as 
follows: 

StO  Removes  the  lop  word  from  (lie  Evaluation  Stack  and 
stores  it  at  the  operand  address.  1  he  R  field  is  set  to 
1  in  the  stored  word,  arid  the  X  and  I  fields  are  set  to 
0. 

INC  Removes  thn  top  word  from  the  Evaluation  Stack  and 
udds  it  to  the  word  at  the  operund  address. 

INCt  Increments  the  value  of  the  word  at  the  operund 
address  by  one. 

J!  I  ,  Jl.l  ,  Jtit ,  J(U ,  JI  O,  JNI  Remove  tlio  lop  value 
from  the  Evaluation  Stuck  and  branch  to  the  operand 
address  it  the  value  is  less  than,  less  limn  or  equul  to, 
grouter  tliun,  greater  than  or  equul  to.  equal  to,  or  not 
equal  to  zero,  respectively. 

Value-operand.  Instructions,  Those  instructions  include  I’USII 
And  llio  Arithmetic  Instructions,  Alin,  still.  Mill ,  and  l) IV.  For 
these  Instructions,  the  value  field  is  aguln  Interpreted  us  an 
address,  but  the  operand  is  obtained  by  evaluating  the 
address,  as  explained  below.  Otherwise  the  semantics  of  the 
Instruction  are  as  follows: 

RUSH  pushes  Its  operand  onto  ttie  Evaluation  Stack. 

Nf  Ci  negates  Its  operand  before  pushing  it. 

Alii)  removes  the  top  word  from  the  Evaluation  Stack  and 
adds  it  to  Its  operand,  leaving  the  sum  on  the 
Evaluation  Stack.  Type  conversions  are  performed,  if 
necessary,  according  to  stundard  FORTRAN 
conventions.  (Typo  Information  is  available  in  the  tag 
fields  of  tfie  dutu  on  the  Evaluation  Stack.) 

SUB,  MU l  ,  OIV  work  like  ADD,  with  the  left-hand  argument 
being  on  the  stack  and  the  right-hand  argument  being 
the  operand  of  the  Instruction. 

Occasionally,  one  will  want  un  instruction  such  us  ADD  to 
take  both  Its  operands  trom  the  stack.  We  therefore  adopt  the 
convention  that  il  vAtut-:  =  0,  the  operand  normally  specified  In 
tlw  instruction  will  be  found  as  the  topmost  element  on  the 
Evaluation  Stack.  This  applies  to  both  nddress-opernnd  and 
value-cperund  Instructions, 

3  ?  Operand  evaluation 

As  stated  above,  value-operand  Instructions  obtain  their 
operands  by  cvnluaiing  the  address  which  appears  In  the 
instruction.  In  this  architecture,  the  evaluation  mechanism 
uniformly  repluces  the  "fetch-the-contents-of"  mechanism  in 
traditional  architectures  To  evaluate  an  address  A,  the 
current  instruction-execution  state  is  saved  on  the  Control 
Stack  und  execution  begins  at  A.  Alter  each  instruction 
completes,  the  r  bit  Is  examined;  II  n=  1,  (tie  Control  Slack  Is 
popped,  terminating  the  new  instruction  sequence  and 
returning  to  the  previous  one  ct  the  point  where  it  was 
interrupted.  In  our  examples,  we  will  indicate  that  an 
instruction  has  r  =  1  by  appending  V  lo  the  operation  name. 


Strictly  speaking,  there  Is  no  restriction  on  what, 
Instructions  can  occur  in  the  new  Instruction  sequence. 
However.  It  is  our  Intent  that  the  sequence  ol  instructions, 
which  Is  called  a  phrase,  will  leave  a  single  value  on  the 
Evaluation  Stuck.  II  we  make  the  lurther  assumption  that  the 
computation  is  Independent  of  data  already  on  the  Evuluutlon 
Stack,  It  is  possible  to  speak  ot  the  value  ul  A,  or  the  value  ol 
tlw  phrase  A, 

Note  that  a  singlo  data  inslruction,  with  n  *  1 ,  sallslios 
these  conditions  lor  u  phrase.  Hence,  a  single  duta  word  may 
be  "fetched"  by  evaluating  (executing)  It. 

3.3  Indexing 

The  x  Held  in  provided  in  tags  to  perform  some  simple 
address  arithmetic.  When  *  =  1,  the  address  in  the  Instruction 
is  lirst  incremented  by  the  value  found  on  top  ol  the  Evaluation 
Slack  (which  is  removed  as  u  side  cllocl).  The  new  address 
becomes  the  operand  (lor  address-operand  Instructions)  or  the 
address  lo  he  evaluated  to  obtain  the  operand  (lor  vulue- 
operund  instructions).  In  our  examples,  we  will  indicate  that 
x  =  1  in  un  instruction  by  appending  "x"  to  the  Inslruction 
name,  as  in  "STOx  A". 

Occasionally  It  will  be  useful  to  obtain  an  Indexed  address 
on  the  stack  without  evaluating  the  result.  We  therefore  allow 
the  x  field  to  be  set  In  the  ADDIt  Instruction,  In  which  case  the 
address  present  In  the  valuf.  Held  ol  the  ADDIt  Instruction  Is 
incremented  by  the  value  on  top  ol  the  Evaluation  Stack,  and 
the  resulting  address  Is  pushed  onto  the  stack. 

3.4  Indirection 

The  I  Held  is  used  to  provide  an  extra  level  of  evaluation  In 
obtaining  operands.  When  i=1,  the  operund  obtained  by  the 
above  mechanisms  Is  evuluated  an  extra  time  to  obtain  the 
true  operand.  For  instunce,  in  "SIQi  A",  the  uddross  A  Is 
evaluated,  and  the  actual  store  occurs  to  the  address  returned 
by  the  phrase  A.  In  "AUDi  A,"  the  uddress  A  is  lirst  evaluated 
normally;  then  the  resulting  value  of  A  Is  evaluated,  yielding  the 
operand. 

This  mechanism  makes  several  assumptions.  In  particular, 
in  value-operand  Instructions  It  Is  assumed  that  the  value 
returned  by  the  lirst  evaluation  Is  an  address  (so  thut  It  cun  be 
evaluated  again),  Likewise,  In  address-operand  instructions  It 
is  assumed  that  the  evaluation  (the  one  caused  by  I  *  1  is  the 
only  one)  produces  a  value  ol  uddress  typo. 

When  I  *  X  »  1,  the  Indexing  operation  Is  applied  before  the 
(first)  evaluation. 

3.5  Discussion 

Returning  to  our  original  example,  we  can  see  what  the 
code  actually  looks  like  in  this  architecture. 

X  =  (A  ♦  B)*C  +  (A  +  B) 

PUSH  P3 

MUl.  C 

ADD  P3 

STO  X 


P3 ; 

PUSH 

A 

ADDr 

B 

A; 

REALr 

23.6 

B: 

REAl.r 

-3.0 

C: 

REALr 

4.66EI 

X: 

REAl  r 

0,0 

Note  how  the  evaluation  mechanism  is  exploited  in 
collecting  the  formally  Identical  expressions  Into  a  single 
phrase  (P3), 

The  indexing  and  indirection  mechanisms  are  optimization# 
designed  to  facilitate  uddress  computations  in  array  and 
structure  accesses,  much  like  the  use  of  index  registers  In 
conventional  architectures.  In  (a),  below,  we  see  the  simplest 
form  ol  indexing;  In  (b)  the  two  occurences  ol  "C(l)"  iutve 
been  implemented  as  a  single  phrase;  in  (c)  the  phrase  has 
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been  constructed  to  compute  the  address  of  C(l)  since  both 
the  address  and  value  are  needed. 

C(l)  *  A ( J )  X  *  (CO)  +  B)'C(I)  CO)  *  CO)  +  B 


PUSH 

J 

PUSH  L 

PUSH!  L 

PUSHx 

A- 1 

ADD  8 

ADO  B 

PUSH 

I 

MUL  L 

STOt  L 

STOx 

C-l 

STO  X 

... 

L :  PUSH  I 

PUSHxr  C-l 

Li  PUSH  1 

AODRxr  C-l 

(a) 

(b) 

<c) 

These  examples  indicate  that  there  Is  some  choice  In  how 
to  structure  the  ob|ect  code.  In  terms  of  space-efficiency,  any 
expression  appearing  In  the  source  program  more  than  once 
should  be  expanded  us  a  separate  phrase.  Execution-time 
efficiency  can  be  gained  by  additionally  separating  expressions 
used  within  a  loop;  if  their  value  does  not  change,  the  effect  Is 
the  same  as  It  the  compiler  had  moved  them  outside  the  loop. 

3.6  Tho  Value  Cache 

The  Value  Cache  Is  the  most  unique  and  Important  part  of 
the  architecture.  Its  purpose  is  to  save  the  value  of  phrases. 
Every  time  an  evaluation  Is  attempted,  the  Value  Cache  Is  first 
checked  to  see  If  It  contains  the  phrase's  value;  It  found,  the 
value  can  be  Immediately  entered  on  the  Evaluation  Stack 
without  any  need  to  actually  execute  the  phrase  In  question.  If 
the  Value  Cache  does  not  contain  the  desired  value,  evaluation 
proceeds  normully  und  the  new  value  Is  copied  Into  tho  Value 
Cache  as  a  side-elfect  of  the  processing  of  the  H  field  in  the 
last  Instruction  of  the  phrase. 

An  importunt  part  of  the  cacheing  mechanism  is  keeping 
track  of  dependency  information.  The  value  of  a  phrase  can 
depend  on  an  unbounded  set  of  memory  locations- -namely  all 
.those  which  are  referenced  In  the  course  of  Its  evaluation. 

'  Should  any  of  Iheso  locutions  be  changed,  the  old  vulue  in  the 
Valuo  Cache  must  be  purged. 

Because  tho  spuco  available  to  represent  dependency 
.information  In  the  cache  will  bo  limited,  we  must  huve  a  way  to 
encode  the  dependency  Information,  A  possible 
Implementation  is  to  represent  the  dependency  set  as  a  bit 
vector  ot  length  n.  A  dependency  on  a  particular  memory 
word  with  address  A  could  then  be  mapped  Into  one  of  the  n 
bits  by  an  operation  on  the  word's  address,  0(A).  An  inclusive 
"OR"  of  ull  encoded  addresses  would  then  represent  the 
dependencies  of  the  phrase.  Purging  from  the  cache  all 
values  dependent  on  address  B  could  be  accomplished  by 
eliminating  ull  entries  which  included  bit  0(B)  in  their 
dependency  mask. 

To  explain  how  the  Value  Cache  Is  used,  we  need  some 
information  about  both  the  Value  Cache  and  the  Control  Stack, 
The  Value  Cache  Is  an  associative  memory,  each  entry  ol 
which  has  three  Helds: 

vc  address  address  of  phrase 

vc-value  value  of  pltrase 

vcoependency  dependency  of  phrase 

Control  Stack  entries  also  have  three  fields: 

p address  address  of  phrase 

estate  current  execution  state 

CS' dependency  accumulating  dependency 

There  are  four  activities  which  involve  the  evaluation 
mechanism  and  the  Value  Cache: 

Beginning  an  evaluation,  The  Value  Cache  Is  checked  to  aee 
If  It  contains  the  phrase's  value;  if  so,  the  value  Is  immediately 
entered  onto  tire  Evaluation  Stock  and  the  evaluation  Is 
considered  complete;  dependency  Information  from  the  Value 
Cache  (vc  dependency)  is  added  to  the  dependencies  betng 


accumulated  lor  the  current  phrase  (cs  dependency).  If  the 
phrase  is  not  found,  the  current  execution  state  Is  saved  on 
the  Control  Stack  and  a  new  frame  is  added  for  the  new 
phrase,  whose  evaluation  begins,  csdependency  for  the  new 
phrase  is  Initially  null. 

During  evaluation.  Every  execution  of  a  data  Instruction 
represents  a  dependency:  the  dependency  Is  derived  from  the 
address  of  the  data  instruction.  The  encoded  dependency  Is 
added  to  the  dependencies  already  recorded  in  CSDEPENDENCY, 

Alter  evaluation.  When  an  Instruction  with  n-1  Is  completed, 
the  phrase  value  (the  top  value  on  the  Evaluation  Stack),  p. 
address,  and  cs  dependency  are  sent  to  the  Value  Cache  for 
recording  as  vcvaiuf,  vc  address,  and  vc-depenoency, 
respectively.  (If  the  Value  Cache  Is  full,  some  mechanism  for 
removing  entries  must  be  employed.)  The  Control  Stack  Is 
then  popped  to  return  to  the  previous  phrase;  the 
dependencies  of  the  completed  phrase  are  added  to  the 
dependencies  accumulating  ‘or  the  previous  phrase.  (That  is, 
II  phrase  A  Invokes  phrase  B,  phrase  A's  dependencies  include 
those  ol  phrase  B.) 

During  a  store  operation.  Whenever  a  STO,  INC,  or  INC1 
Instruction  is  executed,  every  Vulue  Cache  entry  which  shows 
a  dependency  on  the  altered  word  is  purged.  (This  may  not  be 
a  perfect  discrimination,  depending  on  the  encoding  D(X).) 
The  vulue  being  stored  (itself  u  phrase)  is  entered  Into  the 
Value  Cache  as  a  side-effect;  its  dependency  is  precisely  itself. 

As  an  example,  consider  the  following  (assume  M(6)«45): 
K  a  M(l)  4  I 

PUSH  L 

STO  X 

L:  PUSH  I 

push*  m-i 

ADD  r  I 

t:  1  NT  r  6 

K:  INtr  45 


There  are 

four  phrases  entered 

In  the  Value  Cache  after 

executing 

this  statement: 

VC-APRESS 

VC  VALUE 

VCDEPENDENCY 

1 

1NT  8 

D(l) 

M  +  5 

I NT  46 

D(M  +  S) 

L 

ADDR  M+6 

0(1)  V  D(M  +  6) 

K 

INT  61 

D(K) 

If  we  later  changed  the  value  of  I,  the  phrases  I  and  L  would 
be  purged  from  the  Value  Cache,  but  M(0)  (i.e.  M  +  6)  would 
remain,  unless  by  chance  D(l)  ■»  D(M  +  5). 


4.  Measurements 

To  obtain  objective  measures  of  the  performance  of  this 
architecture,  we  resent  here  analyses  ol  tour  simple  programs: 
three  production-quality  statistical  subroutines  taken  from  the 
Scientific  Subroutine  Package  and  one  simple  quadratic- 
oquatlon  solver  taken  from  an  introductory  programming  text. 
When  we  say  production-quality,  we  mean  that  there  is  no 
obvious  way  to  rewrite  the  source  program  more  efficiently  In 
the  statistical  subroutines.  In  contrast  to  this,  the  quadratic- . 
equation  program  contains  several  examples  of  formslly 
identical  (and  redundant)  expressions. 

We  examined  the  execution  of  these  programs  on  three 
compiler/architecture  pairs:  on  our  architecture  with  a  simple 
compiler  performing  no  optimizations;  on  a  DEC  PDP-10  with 
the  FOR  TRAN- 10  optimizing  compiler;  and  on  a  mod  II  lad  slack 
architecture  (MSA).  The  MSA  is  a  variant  of  our  architecture, 
obtained  by  eliminating  the  evaluation  mechanism  (including 
Value  Cache  and  Control  Stack)  in  favor  of  the  simple  "fetch- . 


the-contents-d"  mechanism;  it  is  thus  a  simple  stack 
architecture  with  the  same  one-operand  Instructions  as  in  our 
architecture.  The  compiler  lor  this  architecture  is  Identical  to 
the  one  for  our  principle  architecture. 

Code  size  statistics  were  obtained  from  listings  of  the 
complied  assembly  code.  Execution  statistics  were  obtained 
from  instruction  traces  on  the  PDP-10  and  from  emulators  of 
the  other  architectures.  In  emulating  our  architecture,  we  used 
a  Value  Cache  with  100  entries  and  a  32-bit- wide  dependency 
field  with  0(A)  .  A  mod  32, 

In  comparing  program  sizes,  we  assume  that  a  "word"  Is 
equivalent  on  the  different  architectures.  Likewise,  execution 
statistics  are  expressed  os  the  number  of  memory  fetches  and 
stares  (Instructions  plus  data).  We  do  not  count  Internal 
processing,  so  all  instructions  take  unit  time  unless  they 
Involve  a  fetch  or  store  from  memory.  (We  do  not  consider  the 
Value  Cache  to  be  memory  in  this  sense,)  With  this  In  mind, 
we  present  the  data  in  Tables  1  and  2.  Tables  3  and  4  present 
)the  same  data  as  a  fraction  of  the  MSA  values. 


'  Prooram 

PDP-10 

Architecture 

Ours 

MSA 

•isi 

186 

211 

224 

'JS2 

148 

168 

166 

^83 

80 

94 

96 

lS4 

121 

118 

169 

TabU  1 :  Coda  alia  (worda) 

and  execution.  However,  even  with  well-coded  programs,  we 
see  a  significant  improvement  over  a  simple  stack  architecture. 

Of  course,  these  tew  examples  cannot  alone  establish  the 
benefits  of  our  architecture.  It  is  meant  only  as  an  informal 
argument  to  establish  the  possibility  of  such  benefits,  even  in 
programs  not  easily  optimized.  We  hope  to  provide  more 
quantitative  evidence  on  a  wider  range  of  programs  in  the* 
future,  along  with  more  Information  on  the  effect  of  the  size  of 
the  Vajue  Cache  and  on  cache-full  policies  [3], 
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•181 

2.162 

2,414 

3,647 

S2 

1,282 

1,728 

2,219 

S3 

6,616 

9,666 

12,942 

S4 

408 

447 

824 

Table  2:  Execution  ipeod  (letches) 


Architecture 


Prooram 

PDP-10 

Ours 

SI 

.69 

.66 

S2 

.68 

.78 

S3 

.60 

.76 

S4 

.60 

.64 

Tablt  4:  Execution  apeed  (fraction  at  MSA) 

The  PDP-10  and  MSA  are  in  a  sense  upper  and  lower 
bounds  for  comparison  purposes.  The  PDP-10  Is  a  mature 
instruction  set  In  the  traditional  Von  Neumann  mold;  It  has 
been  carefully  designed  and  optimized.  MSA  on  the  other 
hand  is  the  simplest  stack  machine  one  can  Imagine.  Likewise 
the  PDP-10  Incorporates  a  sophisticated  compiler,  whereas  the 
other  architectures  have  very  simple  compilers,  (In  particular, 
they  do  not  even  have  to  do  register  allocation.) 

The  data  confirms  that  the  PDP-10  is  still  the  more  highly 
optimized  architecture,  but  in  the  case  of  the  S4  program,  our 
simple  compHer  was  able  to  produce  code  which  was  more 
comped  and  which  executed  almost  as  quickly.  Clearly  the 
benefits  depend  to  some  extent  on  the  degree  to  which 
redundant  expressions  can  be  eliminated  during  compilation 
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Aba tract 

A  mechanism  for  supporting  fine-grain 
program  protection  and  abstraction  in  a  multi - 
oomputar  context  is  described.  It  is  argued  that 
such  features  are  necessary  to  support  high  level 
user  interfaces  and  particularly  high  level  lan¬ 
guage  iaplemantations  using  microprogram  control, 
that  their  cost  must  be  sswll  in  relation  to 
ilLcroinatructions.  The  mechanism  is  currently 
being  investigated  by  simulation  techniques  as 
part  of  a  general-purpose  system  study. 

Objectives 

The  most  important  abjective  of  general- 
purpose  oo sputa r  design  is  to  model  accurately, 
reliably  and  efficiently  the  data  of  wiAtly  varying 
problem  domains.  Ha  might  Instance  records, 
messages,  tax  tablaa  or  graphical  images  as  typic¬ 
al  classes  of  data  familiar  to  computer  users,  and 
to  the  extent  that  the  attributes  of  m  class, 
neither  more  nor  lese,  are  recognised  we  can  say 
that  s  successful  abetraoticn  has  been  achieved. 

He  define  a  'high  level'  architecture  as  one  that 
supports  such  abstractions  for  an  open-endad  list 
of  classes.  Its  ioportancs  is  that  it  enables  com¬ 
plex  data  processing  applications  to  be  developed 
and  maintained  in  a  reliable  state  by  offering  to 
information  engineers  something  comparable  with 
the  subassemblies  and  precise  tolerances  of,  say, 
mechanical  design.  Overall,  one  expects  as  a 
result  to  produce  better  systems  more  quickly  and 
more  reliably  and  at  a  lower  cost  than  would  other¬ 
wise  be  possible. 

The  complexities  of  operating  aystams 
have  drawn  attention  to  the  importance  of  program 
structure,  most  designers  making  use  of  the  ideas 
of  task  (i.e.  process),  file,  segment,  event  and 
others  in  abstract  form.  He  oould  include  oode 
segment  in  the  list  and  thus  Isad  to  ths  accursts, 
reliable  and  efficient  modelling  of  high  level  lan¬ 
guages,  but  It  would  be  a  mistake  in  the  present, 
context  to  put  either  operating  system  or  language 
engineers  in  position*  of  privilege  since  (through 
no  fault  of  their  own)  that  seams  to  guarantee  poor 
response  to  user  requirements .  For  example,  in 
range-defined  ardiitectnra  (in  the  style  of  the 
IBM  360)  the  micro  programmer  has  in  affset  bsen  a 


language  engineer  with  considerable  privilege i 
for  precisely  that  reason  it  ha*  been  ispraotioal  j 
to  make  wide  us*  of  improvement!  in  the  encoding 
of  high  level  languages  which  depend  on  having  i 
variable  intermediate  code  formats.  Attempts  to 
dafins  architectures  at  even  higher  level  run 
correspondingly  higher  risks. 

The  order  of  events,  therefore,  is  to 
define  the  abstraction  machaniam  first  and  than 
use  it  to  model  whatever  operational  behaviour  is 
required.  But  what  is  meant  by  doing  that 
'efficiently1?  Fifteen  yean  ego,  vndar  the 
umbrella  provided  by  the  IBM  360,  it  seamed  suffi¬ 
cient  to  achieve  ths  objective  with  'no  increase 
in  program  sire  or  loss  of  speed' ,  which  is  sisan- 
tiallt  what  happened  with  the  Basic  Language 
Machine1.  Today  that  umbrella  is  permeable  end  toj 
out-perform  current  range-defined  architectures  is  I 
coiasonplac*.  The  essential  requirement  now  seams  | 
to  be  to  provide  ths  benefits  of  abstraction  at  | 
the  finest  level  of  description  used  by  system, 
language  or  application  engineers  -  in  other  word r 
at  what  is  usually  regarded  as  the  microcode  level  j 
Once  that  is  done,  ths  way  is  open  to  realising  in 
a  practical  context  the  advantages  of  microcoding 
that  have  often  been  demonstrated  under  special 
conditions. 


In  this  paper  I  shall  outline  a  design, 
which  for  reasons  soon  to  become  clear  is  called 
a  "Pointer-Number  system",  which  demonstrate*  one 
way  of  meeting  the  objectives.  It  takes  account 
of  system  requirements  not  mentioned  hare,  and  has 
bsen  carried  to  a  detailed  simulation  in  order  to  , 
make  realistic  performance  estimates.  In  Che  next ( 
subsection  we  review  the  techniques  on  which  it  is  i 
based  and  the  rang*  of  problems  that  have  to  be  j 
solved  at  the  next  stag*  of  design.  Ths  following- 
subsections  outline  respectively  ths  'PM  Machine '  j 
and  'IN  System'.  Finally,  sous  conclusions  axe  I 
drawn  from  ths  experimental  work  don*  so  fax.  The  I 
reader  is  referred  to  the  PN  Syetem  Manual1  to r  j 

more  detailed  explanation  and  justification.  j 

i 

Abstraction  Mechanisms  ; 

The  basic  requirement  is  to  mechanise  the  ' 
ideas  that  mi^it  be  expressed  asi  "Let  A  be  a 
class  of  objects  with  attributes  {a.}  i-  0  ..  L* , 
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"Let  ibei  (mesber  of  the  class)  A" ,  "Let  y 


1  VILLI' 


pw.'ll  W/W.Vfl  '“  r.TvT '  '-"TiX - " 


v  a 
?  t 
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denote  (the  same  member  of  the  came  class  aa)  a", 
and  ao  on,  all  the  representations  being  within  the 
limits  of  a  finite  computer  store,  in  programming 
terms  this  quickly  resolves  into  the  use  of  des¬ 
criptors  or  pointers  as  a  type  of  operand  distinct 
from  the  attribute  seta  that  represent  the  individ¬ 
ual  objects,  a  construct  that  has  been  used  from 
the  earliest  days,  though  it  was  not  precisely  en¬ 
gineered  until  segmented  storage  came  into  use  in 
the  early  1960's  (Figure  1).  in  the  case  of  pro¬ 
gram  space  the  connection  between  (Indexed)  pointer 
and  attribute  is  notionally  direct,  but  it  is  a 
simple  extension  of  the  sane  idea  to  interpret  the 
descriptor  as  referring  to  a  member  of  any  given 
class  of  objects,  which  was  the  generalisation  made 
in  the  Basic  Language  Machine  (Figure  2) .  m  the 


M  f  1 

CODEWORD  V  ^ 


(aQ) 
( a 2> 


(aLJ 


Representation 


T:  Segment  type 
L:  Maximum  index 
F;  Location 

Figure  It  Storage  segment 


latter  case  the  pointer  contains  indices  a,  id  that 
uniquely  identify  the  class  and  object  in  question. 
In  accordance  with  current  practice  we  refer  to 
pointers  used  in  this  indirect  way  as  "capabilities" 
but  the  term  "codeword"  is  retained  for  the  special 
case  of  reference  to  storage. 

It  is  implicit  that  pointers  cannot  be 
forged,  otherwise  the  whole  point  of  having  precise¬ 
ly  engineered  program  structures  is  lost.  On  the 
other  hand  they  must  be  manufactured  somewhere  and 
the  class  manager  must  be  able  to  manipulate  the 
representations  directly.  Such  considerations  lead 
to  the  notion  of  protected  domains  characterised  by 
sets  of  pointers  that  define  the  'rights'  of  a  pro¬ 
gram  at  any  instant  of  its  execution.  As  oontrol 
flows  from  one  domain  to  another  there  must  be 
correspoding  changes  in  the  list  of  rights. 

Before  discussing  possible  mechanisations 
we  should  be  aware  of  the  performance  parameters  to 
look  for  in  the  final  analysis.  Amongst  the  most 
important  is  the  time  taken  to  access  the  attribute 
given  a  valid  pointer t  there  is  no  absolute  figure 
to  aim  for,  but  it  is  required  to  be  short  in  rela¬ 
tion  to  the  class  of  operations  that  it  supports. 

For  example ,  in  dealing  with  files  or  tasks  the 


individual  operations  are  fairly  substantial  and  a 
number  of  capability  systems  have  been  implemented' 
in  which  pointers  are  interpreted  by  the  operating 
system  without  serious  loss  of  speed.  In  moving 
towards  sinpler  operations  the  interpretive  mecha¬ 
nism  must  be  refined  and  assisted,  first  by  micro¬ 
program  and  finally  by  hardware ,  and  in  the  present 
context  the  stringent  requirement  of  having  low 
oveLhead  in  relation  to  micro-operations  forces  us 
to  disregard  all  but  the  moat  delicate  controls. 

In  the  model  provided  by  Figure  1  we  might  nominate 
the  'effective  storage  access  time'  as  the  relevant 
parameter.  In  Figure  2  the  critical  time  is  that 
taken  to  move  the  locus  of  control  from  the  'user 
domain',  containing  the  capability,  to  the  'class 
manager  domain'  in  which  interpretation  takas  place 
and  back  again.  In  either  case,  if  the  observed 
cost  is  too  high  users  will  tend  to  avoid  the 
facility  and  lose  its  benefits. 


I°-1 v  E id  I 


CAPABILITY 

a:  Object  olase 
V:  Aooess  options 
id:  object  identifier 


I 

USER'S  * 
DOMAIN  J 

I 

Proteation  ^ 
”  boundary  “**1 
I 


Ob j eat 
Table 
for  aloes  a 


Representation 


1A6 


Figure  2:  Indireot  class  representation 


the  other  factors  are  more  difficult  to 
quantify  because  they  entail  the  Inevitable  com¬ 
promise  between  coat  of  management  and  ease  of  use. 
It  might  be  askedi  "If  members  of  a  class  are 
generated  at  a  given  rate,  what  is  the  resulting 
manegement  overhead?".  Btor  example,  how  often  can 
one  open  new  files,  create  messages,  or  assign  new 
tasks  without  undue  penalty?  Clearly,  some  costs 
are  passed  on  to  storage  management  which  has  to 
provide  file  control  blocks,  buffers,  task  vectors 
and  so  on,  but  there  remains  the  responsibility 
for  master  object  tables,  for  recovering  'dead' 
identifiers,  and  for  error  management.  The  tech¬ 
niques  available  for  reducing  costs  arc  mainly  con¬ 
cerned  with  the  time  taken  to  scan  the  program 
space  looking  for  particular  classes  of  pointer 
and  might  be  aimed  at  eliminating  that  need 


If 

l 


ii 


entirely,  e.g.  byi 

(a)  enlarging  tha  master  object  tables  to  service 
all  foreseeable  requests;  or 

(b)  restricting  the  use  of  pointers ,  e.g.  by 

.  indirect  reference  through  system  tables  or 
by  linguistic  devices; 

alternatively  we  can  seek  to  minimise  the  actual 
scanning  time  by: 

(c)  limiting  the  extent  of  pointer-bearing 
segments;  or 

(d)  constraining  the  program  structure,  e.g.  to 
separate  task  domains  or  to  a  'tree'  form. 

In  any  well-designed  capability  system  the  con¬ 
straints  are  small  in  relation  to  the  benefits  they 
bring,  but  the  fact  remains  they  are  a  psycho¬ 
logical  hindrance  to  widespread  acceptance.  The 
best  way  round  that,  architecturally  speaking,  is 
by  i 

(e)  providing  high  speed  memory  scanning  and  up¬ 
dating  operations,  enabling  many  of  the 
restrictions  to  be  relaxed. 

The  last  solution  is  pursued  in  the  PN  system  by 
using  what  are  effectively  microprogrammed  manage¬ 
ment  procedures  in  conjunction  with  hard-wired 
'planar'  memory  scanning  functions. 

Returning  to  the  primary  measure  of 
storage  access  rate,  it  is  clear  that  no  scheme 
dependent  on  validating  pbnters  at  time  of  use 
(against  access  list,  segment  table,  capability 
regia ters,  etc)  would  be  acceptable,  and  in  order 
bo  compete  with  'unrestricted'  access  mechanisms 
we  are  forced  (i)  to  admit  pointers  as  operands 
used  directly  by  machine  instructions;  and  (ii)  to 
control  their  formation  so  as  to  preserve  the 
integrity  of  programs.  There  etill  seems  to  be  no 
better  way  of  doing  that  than  by  using  a  tagged 
register  format.  However,  in  moving  the  control 
mechanism  to  microinstruction  level  the  interpre¬ 
tation  of  tags  must  be  resolved  in  single  micro- 
ordsrs.  In  theory,  just  one  tag  bit  is  necessary, 
to  distinguish  between  pointers  and  numbers,  but  it 
will  be  aeen  In  the  next  subsection  that  fifteen 
pointers  and  one  form  of  number  are  distinguished 
by  a  four-bit  tag  code. 

We  have  already  seen  that  because  of  its 
practical  importance  storage  is  distinguished  from 
all  other  abstract  classes.  A  further  distinction 
Is  drawn  between  sharable  ( global )  and  unsharable 
( local )  data  areas.  The  corresponding  pointers  are 
oodeuorde  and  addresses  respectively,  which  have 
almost  ldantical  properties  in  normal  use.  It  is 
unfort’inats  to  asks  the  distinction,  but  it  re¬ 
flects  the  fact  that  controlled  access  to  shared 
resources  uaas  a  single  level  of  indirection  which 
is  otherwise  unnecessary.  The  same  mechanism  is 
used  to  distinguish  between  data  that  might  be  at 
a  remote  site  in  a  multicomputer  system  (and  there¬ 
fore  'global')  and  data  areas  that  are  strictly 
local . 


Figure  3  illustrates  the  use  of  pointers 
in  referring  to  different  program  workspaces.  The 
transformation  a  is  handled  by  capability  managers, 
while  B  is  the  responsibility  of  the  segment  mana¬ 
ger.  Parallels  can  be  drawn  between  writing  in  a 
conventional  high  level  programming  language  and 
operating  on  global  data,  between  microprogramming 
and  working  at  local  level.  However,  a  key  feature 
of  the  PN  system  is  that  sharp  distinctions  are  not 
drawn  and  it  is  easy  to  move  from  one  level  to  the 
next. 
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Figure  2:  Levels  of  program  a paae 


A  protection  domain  is  defined  by  the  com¬ 
bined  effect  of  two  sets  of  rules:  those  that 
govern  the  inheritance  of  access  rights  In  regis¬ 
ters  and  storage,  formation  of  new  addresses  from 
old  ones,  restriction  of  access  options  In  capa¬ 
bilities,  etc,  all  of  which  are  reductive  in  char¬ 
acter;  and  those  concerned  with  the  expansion  of 
rights  in  passing  from  one  domain  to  another.  The 
ability  to  expand  rights  depends  on  some  prior 
authority  saying  in  advance  that  "program  module  M 
shall  only  access  resources  m^,  ...  ,  which 

in  turn  devolves  on  the  construction  of  control 
segments  and  associated  data.  Apart  from  the  naed 
for  speed  and  flexibility  in  implementing  such  a 
rule  we  also  require  that  it  should  be  easy  to 
apply  and  not  expensive  to  support.  In  tha  PN 
system  the  region  into  which  rights  expand  is  de¬ 
fined  by  a  set  of  resources  known  as  a  base.  There 
will  be  several  bases  in  a  system,  so  there  is 
scope  for  partitioning  at  that  level.  The  objects 
mg,  . . .  Wj.  are  Identified  by  Indices  that  are 
embedded  in  object  code.  That  seems  to  be  the  most 
economical  way  of  changing  access  lists,  since  it 


is  done  at  zero  cost  in  conjunction  with  control 
transfers.  It  will  be  shown  later  how  the  inter¬ 
connection  mechanism  is  supported  by  machine  func¬ 
tions  in  the  context  of  a  dynamically  changing 
base,  task  and  module  population. 


Pointer-Number  Machines 

in  order  to  evaluate  the  above  ideas  in  a 
practical  system  context  a  detailed  machine  model 
known  as  "microPN"  has  been  defined  and  simulated. 
The  intention  has  been  to  provide  full  support  for 
abstraction  in  the  context  of  an  assembly  of 
processor-memory  pairs,  each  comparable  in  cost 
and  speed  with  current  microprogrammable  machines. 

The  main  components  of  microPN  are  shown 
in  Figure  4.  itie  register  file  (X)  consists  of  16 
32-bit  general-purpose  registers.  Most  internal 
machine  operations  can  be  completed  in  one  or  two 
ALU  cycles,  typically  processing  the  'high'  halves 
of  the  operands  first,  which  includes  checking 
their  tags,  followed  by  the  'low'  portions.  The 
ALU  carries  out  elementary  arithmetic,  logic  and 
shift  operations  on  numeric  words,  and  the  special 
operations  required  in  controlled  pointer  forma¬ 
tion. 


The  sequence  controller  plays  a  conven¬ 
tional  role.  The  most  frequently  used  control 
fields  (control  pointer,  condition  codes)  are  held 
as  separate  registers,  the  remainder  being  found 
in  the  general  register  file  and  protected  from 
mis-use  by  overall  controls  on  program  construe-  . 
tion.  They  include  base  and  task  indices,  stack 
base  and  current  stack  frame,  current  control  seg¬ 
ment  index. 

The  local  memory  controller  serves  requests 
for  data  and  Instruction  accesses  within  the  pro¬ 
cessor  and  external  requests  arriving  via  the 
global  memory  controller.  The  memory  operations 
include  normal  fetch  and  store  of  byte,  word  and 
tagged  values,  and  'planar'  accesses  arising  from 
the  use  of  local  memory  as  an  active  storage 
device. 

The  four  high  order  bits  of  each  register 
conattn  a  tag,  as  shown  in  Table  1.  The  remaining 
26  bits  are  interpreted  accordingly.  The  format 
of  tagged  elements  in  store  is  the  sair«s  as  for 
registers.  Note  that  tags  0..7  are  'global',  and 
have  the  same  meaning  for  every  machine  in  an 
assembly,  while  tags  8..f  are  addresses  with  no 
meaning  outside  the  processor  in  which  they  occur. 


External 

Connections 


GLOBAL 

MEMORY 

CONTROL 


LOCAL 

MEMORY 

CONTROL 


SEQUENCE 

CONTROL 


_  GENERAL 
X  PURPOSE 
REGISTERS 


RITHMETIC , 
nd  LOGIC 
(ALU)  * 


DATA  16 


.ADDRESS  16 


Intet'proaessor 

Highway 


P1ANE  ALU 


PLANE  REGS 


'  LOCAL  MEMORY 

PLANAR  MEMORY 
up  to  64  Kplanet 


Figure  4:  General  schematic  of  microPN  machine 
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microPN  REGISTER  FORMATS. 


GLOBAL  OBJECTS 
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f  0  high 
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Arithmetic 4 
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LOCAL  OBJECTS 


Read-writer 


Read-only 


i  Integer 
i  Entry  winter 
id  Indexable  capability 
i  Indexable  codeword 
id  System  capability 
i  Control  pointer 
id  Capability 
i  Codeword 
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It  can  be  seen  from  Table  1  that  capabil¬ 
ities  and  codewords  have  1  arithmetic*  and  1  non- 
arithmetic '  format  in  the  former  the  object  index 
or  identifier  can  be  altered  by  arithmetic  opera¬ 
tions.  In  neither  cate  can  the  class  or  segment 
index  be  changed  without  authority.  A  distinction 
can  thus  be  drawn  between  a  'singular*  reference  to 
an  object  or  element  of  a  segment  and  one  that  can 
be  treated  aa  om  of  a  sequence. 

Local  objects  are  the  addresses  in  local 
memory  (starting  at  byte  position  F  or  plane  P)  of 
L+l  consecutive  elements  of  the  specified  type. 

The  local  store  is  extended  by  an  optional  planar 
store  which  serves  as  a  back-up  for  the  (presumed) 
faster  local  memory.  In  microPN  planes  are  just 
256  bits  in  size,  and  to  enjoy  the  full  advantage 
of  the  addresssing  scheme  it  Is  envisaged  that 
planes  of  1024  or  4096  bits  will  be  used  in  practice. 
Data  is  transferred  between  levels  via  the  planar 
register  unit. 

Global  segments  are  addressed  indirectly 
by  the  global  memory  controller  through  a  segment 
table  which  might  be  associated  with  another  micro¬ 
PN  processor  in  the  same  assembly.  Segment  table 
entries  have  the  same  form  as  addresses.  Figure  5 
shows  the  principle  of  interprocessor  communication 
assuming  a  bi-directional  data  and  address  bus  of 
32  bits.  The  requesting  program  applies  a  memory 
function  m  to  the  codeword  (e,i)  .  From  a  the 
position  of  the  'host'  is  found*  if  not  in  the  same 
processor-memory  pair  the  parameter!  are 

transmitted  to  the  receiving  module,  where  the 
function  m  is  interpreted  with  reference  to  its 
segmsnt  table.  A  suitable  reply  is  sent  to  the 
requesting  program.  Details  of  the  interaction 
depend  on  performance  objectives  and  cannot  be 
meaningfully  examined  until  program  design  strate¬ 
gies  have  been  fully  explored. 
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The  'plane  ALU'  operates  on  three  planar 
registers,  each  256  bits  in  microPN i  an  accumulator 
which  can  bo  regarded  as  16  words  of  16  bits  or  one 
bit  from  each  of  256  words  stored  in  plane  sequence ) 
a  carry  plane  associated  with  the  accumulator  for 
bit-serial  operations*  and  an  activity  plane  that 
selectively  controls  store  write  operations.  A 
further  set  of  operations  is  provided  to  move  the 
accumulator  in  either  'row'  or  'column'  direction, 
with  linear  or  cyclic  edge  connections.  The  planar 
functions  are  designed  primarily  to  assist  in  high 
speed  operations  on  numerical  data,  digitised 
images ,  signal  data,  etc.  However,  in  the  present 
context  planes  play  a  prominent  part  as  32-byte 
units  of  memory  allocation,  and  planar  functions 
are  used  in  module  interconnection  and  scanning 
operations.  The  conventional  store  operations  are 
extended  to  transmit  numeric  data  between  general 
purpose  registers  and  word  planes  along  common  row 
or  column  data  lines.  Hence  the  design  achieves 
another  fundamental  objective,  of  easy  transition 
between  'parallel'  and  'scalar'  modes  of  operation. 

In  a  tagged  machine  the  instruction  set  is 
designed  to  carry  out  normal  arithmetic  and  logical 
functions  on  numeric  data  and  to  provide  separate 
functions  for  operating  on  pointers.  Thus  the 
'modify'  function  in  various  forms  applies  to  any 
address  and  increases  F  (or  P)  by  a  given  amount, 
decreasing  /<  accordingly.  The  'limit'  operations 
reset  L  to  a  lower  value.  If  the  bounds  of  the 
original  sequence  are  exceeded  an  'invalid  address' 
(system  capability  class  8,  see  below)  is  returned. 
In  that  way  the  current  protection  domain  can  be 
delineated  with  a  precision  of  one  byte. 

In  microPN  there  are  eight  primary  func¬ 
tion  groups,  of  which  four  are  tag-independant  and 
four  restrict  the  tag  of  one  or  two  general-purpose 
registers.  The  tag  limitations  can  be  simply  ex¬ 
pressed  in  tabular  form  and  as  far  as  con  be  seen 
would  have  very  little  effect  on  cost  or  spaed. 
Nevertheless  the  essential  protection  mechanisms 
have  been  retained. 

An  incidental  effect  of  the  PN  protection 
scheme  is  to  make  it  easy  to  apply  'execute-only' 
options  to  control  segments.  Advantage  has  been 
taken  of  that  to  preserve  some  engineering  flexi¬ 
bility  and  to  undertake  some  security  checks  during 
program  translation.  For  exanple,  ail  register, 
base,  label  and  system  function  Indices  are  checked 
by  the  compiler  and  written  into  code  sequences 
knowing  that  they  cannot  be  changed  by  the  user. 
Similarly,  privileged  function  codes  (such  as  'set 
tag')  can  be  generated  without  direct  control  by 
the  programmer  and  there  is  no  need  for  a  distinct 
'microsystem  state'.  There  is,  of  course,  the 
possibility  of  code  being  corrupted  by  store  mal¬ 
function  which,  like  pointer  errors,  could  lead  to 
wider  breakdown.  Whether  to  control  such  errors  by 
further  checks  on  the  code ,  the  pointers, the  task 
space,  the  processor,  ...  or  at  soidb  other  boundary 
depends  on  the  type  of  reliability  and  availability 
that  is  demanded. 
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Polntar-Nimber  systems 

The  PN  system  supports  ton  classes  of 
abstract  objects,  see  Table  2.  The  aim  of  each  ab¬ 
straction  is  to  disclose  as  much  about  each  class 
as  the  user  needs  to  know  in  order  to  operate  on  it 
efficiently,  concealing  attributes  that  are  irrel¬ 
evant  or  liable  to  change.  For  example,  binary  in¬ 
struction  formats  are  concealed  in  the  definition 
of  oontrol  segments  in  order  to  allow  freedom  to 
change  the  instruction  coding.  The  system  abstract 
objects  constitute  the  resources  available  for  pro¬ 
gram  construction  at  the  lowest  design  level.  To 
reach  the  level  of  facility  normally  seen  by  appli¬ 
cation  or  system  programmers  new  classes  of  object 
such  as  'message'  or 'queue'  will  be  implemented  in 
terms  of  those  that  already  exist.  The  use  of 
separate  tag  codes  for  'system'  and  'user'  capabil¬ 
ities,  while  not  strictly  necessary,  is  helpful  in 
defining  system  structure, 


TABLE  2 

PN 

System  capabilities 

(All  elements  in  this  group  have  tag  4, 
the  index  value  id  identifies  a  member 
of  the  alass  o) 

a:  0 

Null 

1 

Control  segment 

2 

Pointer  segment 

3 

Base 

4 

Task 

6 

File 

6 

Host  (Prooessor-memory  pair) 

7 

CFC  (see  text) 

8 

Function  error 

16 

Numeric  segment 

The  principles  of  capability  management 
are  widely  understood,  so  we  examine  here  only 
aspects  peculiar  to  the  PN  system. 

Data  segments 

An  important  distinction  is  drawn  between 
data  segments  (identified  by  numeric  or  pointer 
capabilities)  and  access  paths  to  them  (identified 
by  codewords) .  A  given  segment  may  be  accessible 
through  0 , 1  or  more  such  paths  at  a  time ,  each 
using  a  distinct  index.  Their  allocation  is  con¬ 
trolled  by  system  functions  to  facilitate  data 
sharing  at  global  level.  The  distinction  is  impor¬ 
tant  because  not  all  operations  on  segments  demand 
access  to  individual  elements:  for  example,  one 
might  want  to  know  the  type  or  size,  position  in 
the  hierarchy,  or  simply  to  pass  the  segment  capa¬ 
bility  as  a  parameter. 


Control  segmerts  ' 

In  the  same  way,  control  segment  capabil¬ 
ities  are  distinguished  from  control  pointers  (tag 
1  or  5) .  A  control  segment  contains  encoued  in¬ 
struct*  ons  and  data  derived  from  definitions  given 
in  the  system  programming  language.  Although  many 
features  of  the  PN  machine  are  abstracted  the  seg¬ 
ment  size,  which  contributes  to  channel  loading  and 
working  set  requirements,  1b  not:  in  microPN  the 
maximum  size  is  4096  bytes.  There  is  only  weak 
connection  between  segments  and  control  flow,  i.e, 
change  of  segment  does  not  inply  change  of  proce¬ 
dure,  nor  vice  versa,  the  reason  being  that  although 
one  can  sometimes  take  advantage  of  such  conventions 
it  is  usually  undesirable  to  couple  logical  control 
structure  to  physical  store  assignment. 

The  definition  of  control  segments  in¬ 
cludes  a  precise  specification  of  the  registers 
they  use,  their  entry  points,  and  external  connec¬ 
tions  that  may  be  established  with  reference  to  the 
environment  at  time  of  use.  The  compiler,  in  con¬ 
junction  with  machine  functions,  ensures  that  the 
bounds  so  defined  are  strictly  observed.  That  is 
the  essential  requirement  of  software  engineering, 
brought  down  to  'micromachine'  level.  A  logical 
property  of  a  control  segment  (Figure  6)  is  that 
the  only  resources  it  can  use  are  those  defined  in 
or  accessible  from  the  registers  at  a  point  of 
entry  (e^,  e^ ,  or  in  Fig. 6)  ,  or  those  acquired 
by  expansion  (m^  or  mg) ,  or  those  that  it  creates 
by  using  one  of  the  resource  managers. 
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Figure  6:  Interconnection  of  oontrol  segments 


It  is  theoretically  attractive  to  have 
precise  control  over  which  of  the  entry  points  to 
a  module  can  be  used  in  a  given  context.  For 
example,  if  M  controlled  a  class  of  queues  and  e ^ 

Bj  and  allowed  users  to  'join',  to  'leave',  and 

to  'delete'  a  specified  queue,  it  might  be  desir¬ 
able  to  withhold  eg  from  all  but  a  limited  subset 

of  users.  That  would  mean  having  distinct  pointers 
for  each  entry  point  and  increased  overhoads  in  the 
management  of  bases.  On  balance,  it  is  preferable 
to  define  inly  a  single  codeword  for  the  module, 
say  M,  and  to  enumerate  the  entry  pointers  as  W, 

M+l ,  and  Wt2,  corresponding  to  e^,  e  j  and  e g  in  the 
example.  More  precise  control  can  be  achieved  by 
(a)  using  separate  control  segments  for  'join'  and 
'leave'  on  one  hand  and  'delete'  on  the  otheri 


s 
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(b)  by  using  part  of  the  identifier  field  to  encode 
the  permissible  operations  (the  'access  options'  ir 
Fig. 2) f  or  (c)  by  controlling  the  indexing  opera¬ 
tions  in  a  higher  level  language. 

Once  formed,  a  control  segment  is  ready 
for  execution.  There  is  no  need  to  ..oad  or  consol¬ 
idate  it  into  a  particular  program,  task  or  pro-' 
cessor  space.  The  reason  for  that  design  decision 
is  that  it  qives  the  greatest  flexibility  in  pro¬ 
gram  construction  at  a  cost  which,  from  experience 
of  similar  eystsma ,  appears  to  ba  small.  External 
connections  are  defined  by  reference  to  the  current 
bare  and  task,  but  since  the  aame  segment  might  be 
in  concurrent  execution  with  reference  to  several 
different  bases  and  tasks,  cadi  with  different  com¬ 
ponents,  the  environmental  vectors  are  treated  as 
'sparse'  and  connection  it  made  by  an  associative 
search  using  the  resource  name  as  argument.  The 
association  is  dona  by  parallel  (planar)  operations 
and  is  relatively  fast. 

The  only  method  of  expanding  rights  is  via 
the  list  of  resource  names,  and  strictly  upaaking 
the  inclusion  of  a  name  in  a  control  segment  should 
ba  subject  to  formal  checks.  It  would  be  possible 
to  give  a  list  of  'valid1  names  to  each  user  or 
software  design  group,  but  here  again  the  advantage 
gained  from  a  strict  rule  of  construction  must  be 
balanced  against  the  cost  of  administering  it.  In 
our  experience  informal  controls  are  sufficient  for 
most  applications,  wherein  the  'prior  authority' 
can  vsrify  by  inspection  of  the  source  code  that  a 
Control  module  (such  as  M)  cannot  extend  its  effect 
beyond  the  permitted  bounds  (euch  as  w.  and  m^) . 

Function  errors 


For  any  machine  or  system  function  con¬ 
strued  as  'failinq'  there  is  a  choice  of  aborting 
the  task  or  returning  a  racogi  ieably  invalid  result 
from  system  capability  class  S.  The  choice  1b  a 
practical  matter i  for  exarple,  Illegal  tagt  abort 
the  program,  whereas  address  overflow  returns  an 
invalid  address.  If  the  former  option  is  taken  the 
'result'  of  a  task  Is  itself  a  class  8  capability. 
In  all  cases  the  encoding  of  the  index  field  gives 
the  function  type  and  reason  for  failure. 

A  similar  convention  can  be  applied  in 
the  user  domain,  returning  class  0  system  capabil¬ 
ities  ('Null')  to  indicate  failure.  With  regard 
to  dynamic  type  checking,  the  user  can  easily 
'break  open*  a  capability  to  examine  its  class  and 
tag  fields.  There  are  three  courses  of  action: 

(a)  to  assume  all  types  sre  correct  and  expect 
to  fail  later  (e.g.  on  tagcheck)  if  they 
are  not* 

(b)  to  check  types  and  fail  gracefullyi  or 

(c)  to  check  types  and  return  a  Null  result. 

There  are  many  tactical  variations)  which  to  use 
depends  on  the  level  of  understanding  between 
caller  and  callee,  and  it  le  important  not  to  pre¬ 
empt  the  decision  in  system  deaign. 


Capability  management 


To  form  a  new  class  of  abstract  objects 
the  desigrer  requests  permission  from  the  system, 
which  returns  a  capability- forming- cap  ability  (CFC) 
containing  the  index  a  of  the  new  class,  to  form 
a  new  capability  one  can  then  present  to  the 
system  that  CFC  together  with  the  object  index  id- 
It.  return  a  tag  6  user  capability,  class  o,  index 
id  is  obtained: 


CFC 

and  objeat  id 
(five 

ueev  capability 


X 

o 

lE 

T3" 

i  0  1  o I  id\ 


We  now  see  that  the  typical,  'package' 
dealing  with  a  class  of  objects  consists  of  a 
manager  M,  whose  name  is  made  public,  and  essen¬ 
tially  private  data  structures  such  as  the  master 
object  table  and  CFC  whose  names  (m^  and  m^)  are 

excluded  from  other  segments.  Disclosure  of  M  will 
also  document  the  functions  of  its  entry  points. 

The  'difficult'  aspects  of  M  are  concerned  with 
index  management  which,  as  we  saw  earlier,  leads 
to  various  forms  of  evasion.  In  microPN,  system 
support  is  offered  to  delete  either  (a)  a  given 
capability  or  (b)  a  capability  class  (authorised 
by  the  CFC)  from  program  space. 


Inevitably,  pointers  must  be  scanned 
looking  for  such  capabilities.  In  a  multicomputer 
system  the  rate  of  scanning  store  has  two  impor¬ 
tant  characteristics:  (1)  it  is  relatively  high, 
because  of  the  close  connection  between  processors 
and  memories,  and  (ii)  it  is  roughly  constant 
because  additional  memory  brings  with  it  additional 
processing  power.  As  a  result  we  can  suggest  index 
management  strategies  based  on  the  use  of  email 
m.o.t.'s  whose  entries  are  recycled  when  no  longer 
in  use. 


For  practical  reasons  store  allocation  is 
serviced  by  a  special  set  of  system  functions,  but 
the  above  comments  on  index  management  are  equally 
applicable  to  codewords  and  addresses.  The  planar 
memory  functions  are  particularly  important  in 
store  compaction. 

*  *  * 

In  summary,  it  might  be  said  that  the  main 
problem  of  microsystem  design  is  not  to  invent  new 
facilities  but  to  select  a  basic  subset  from  the 
range  of  possibilities  on  offer.  It  is  paradox¬ 
ical  that  at  a  time  of  great  abundance  in  hardware 
the  need  for  stringency  in  design  is  greater  than 
ever,  but  the  fact  remains  that  there  are  great 
dangers  from  'overkill'  in  hardware  and  software. 
In  microPN  the  decisive  facto**  are  the  need  to 
maintain  security  at  microprogram  level,  and  un¬ 
willingness  to  suffer  loss  of  performance  in  doing 
so.  Enphasis  is  therefore  placed  on  the  ability 
to  aonstruat  high  level  systems  rather  than  commit 
the  design  in  one  direction  or  another. 
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Simulation 

The  PN  design  is  based  on  a  computer  mod¬ 
ule  assumed  to  be  comparable  in  speed  and  complexity 
with  current  microprogrammable  machines.  Besides 
playing  its  part  as  a  member  of  an  assembly,  each 
must  satisfy  the  moat  exacting  requirements  of 
program  reliability  and  language  implementation, 
which  carry  over  (still  unsatisfied)  from  conven¬ 
tional  design.  Before  making  specific  hardware 
recommendations  it  is  necessary  to  study  in  depth 
the  program  organisation  and  behaviour  that  can  be 
expected  in  practice,  so  the  approach  has  been  to 
simulate  one  computer  module  and  to  make  measure¬ 
ments  from  which  the  performance  of  an  assembly  can 
be  inferred.  The  simulator  runs  under  the  UNIX 
operating  system  on  the  PDP-11  series  of  computers. 
Facilities  available  include  a  system  implementation 
language,  system  support  and  error  management  func¬ 
tions,  library  and  on-line  documentation. 

A  multitask  system  is  simulated,  and 
between  any  two  control  points  it  is  possible  to 
count j 


(i) 

instructions  obeyed 

(ii) 

local  store  accesses 

(iii) 

global  store  accesses 

(iv) 

stack  usage 

(v) 

procedure  calls 

(vi) 

module  interconnections 

(vii) 

planar  functions  oboye. 

( viii ) 

planar  routing  distance 

(ix) 

interrupts. 

and 

Elapsed  time  in  the  host  system  is  also  available, 
and  PN  system  functions  can  readily  be  modifiod  to 
give  measures  of  resource  usage,  static  measures  of 
Instruction  coding,  etc.  The  significance  of  the 
above  figures  should  be  clear.  Taking  store  traf¬ 
fic  sis  the  main  parameter  of  performance,  it  is 
found  that  for  every  100  bytes  of  instruction  about 
30-60  further  bytes  of  data  are  hanilod-  If  the 
data  were  all  global,  the  overhead  of  segment  table 
access  would  thus  be  25-40%,  but  that  is  never  the 
case:  it  is  rare  for  leS3  than  90%  of  data  accesses 

t:o  be  local  and  we  conclude  that  the  overhead  is 
negligible . 

As  already  shown,  change  of  access  list  is 
implicit  in  moving  from  one  section  oi  code  to  an¬ 
other,  but  for  each  register  saved  or  restored  at  a 
domain  boundary  six  bytes  of  data  and  instruction 
are  used.  A  complete  task  change  in  microPN  gener¬ 
ates  about  500  bytes  of  store  traffic,  while  the 
search  of  external  names  associated  with  module 
interconnection  generates  about  200  bytes  (50  instr¬ 
uctions  obeyed).  In  scanning  operations,  about  one 
machine  instruction  is  obeyed  for  each  pointer  ex¬ 
amined,  so  that  a  typical  stack  (lass  than  100 
tagged  values)  would  be  scanned  in  lOpsec. 

The  above  figures  begin  to  provide  the 
context  for  high  level  program  design  decisions, 
e.g.  whether  to  use  global  or  local  workspace,  how 
to  distribute  segments  across  computer  modules, 
when  to  use  advanced  forms  of  binding,  what  mix¬ 
ture  of  interpretive  and  in-line  control  to  use. 


and  so  on.  Quite  often  a  high  performance  figure 
is  traded  for  some  other  attribute  such  as  resil¬ 
ience  or  responsiveness  which  is  difficult  to 
quantify.  A  vital  objective  is  to  achieve  perfor¬ 
mance  in  convertible  shape:  that  applies  partic¬ 
ularly  to  the  levels  of  abstraction  and  control, 
because  their  interfaces  undoubtedly  decide  whether 
what  is  possible  in  theory  is  actually  achieved  in 
a  practical  system. 

Costs  are  equally  difficult  to  quantify, 
and  care  must  be  tahan  to  compare  designs  with 
similar  facilities.  For  software  engineering, 
controlled  pointer  formation  is  far  more  effective 
than  segment/page  table  control  and  costs  much  le38 
on  a  gate-for-gate  basis.  Ttie  most  conspicuous 
cost  of  microPN  Is  the  16*16  bit  planar  arithmetic 
unit,  whose  main  contribution  to  the  ny.Ttcm  is  in 
memory  management.  Its  use  enables  the  dedicated 
control  and  scratchpad  stores  normally  found  in 
microprogrammed  machines  to  be  dispensed  with,  thus 
removing  a  serious  obstacle  to  microsystem  support 
for  high  level  languages)  it  also  enables  a  far 
more  flexible  approach  to  be  taken  in  capability 
management  and  program  construction  than  has  been 
possible  in  Burlier  systems.  Whether  it  will  be 
justified  on  balance  remains  to  oe  seen. 

Finally,  it  should  be  stressed  that  the 
mechanism  outlined  here,  while  appropriate  to  the 
control  of  program  space,  does  not  preclude  the  use 
of  other  abstraction  devices.  It  would  be  possible, 
for  example,  to  superimpose  a  capability  mechanism 
extending  into  the  file  space.  It  would  be  advan¬ 
tageous  to  deal  with  some  forms  of  abstraction  by 
'soft'  methods  in  the  confines  of  particular  lan¬ 
guages.  On  the  other  hand,  to  make  the  basic 
architecture  part  of  a  language  or  file  system 
specification  would  be  fundamentally  bad  design. 
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l.S  Abstract 

'me  'iput/output.  Interface  has  tradition* 
all)  been  a  source  ot  trouble  in  computer  sys- 
teas.  A  helrarchiral  model,  baaed  on  Finite 
S.ate  Machines  appropriate  to  ootn  hardware 
and  software,  is  prt  iented  which  addresses 
these  problems.  nils  Model  is  of  Interest  for 
several  reasons:  first,  it  suggests  a  struc¬ 
ture  for  the  design  of  input/output  subsystems: 
second,  it  Is  amenable  to  automatic  manipula¬ 
tion  usinj  well-known  algorithm,  (e.g.  state 
minimization);  third,  it  is  easily  and  eiti- 
clently  implemented  in  software,  firmware,  or 
hardware;  fourth,  automatic  generation  of 
teata  is  poaible. 


2.u  Tntroduction 

while  progress  has  been  ,.e  in  other 
areas  of  computer  system  design,  the 
Input/output  area  has  been  totally  neglected, 
we  apeak  of  an  architecture  as  being  'language 
directed'  to  Indicate  that  it  embodies  the  (hi* 
loaopny  of  a  language.  Me  recognise  an  in¬ 
struction  set,  say,  aa  being  high  level.  Ur  we 
construct  a  memory  system  to  ensure  an  abstract 
requirement  such  as  security.  But  try  aa  we 
may,  no  guiding  principles  can  found  for 
input/output  systems,  'bout  the  only  general 
statement  to  be  made  is  that  data  is  transport¬ 
ed  between  the  outside  world  and  the 
proceaaor/uamory. 


High  level  languages  have  long  been  looked 
to  as  unifying  concepts  for  processor  nnd  sto-i 
rage  architecture.  Significantly,  input/output 
interfaces  are  programmed  almost  universally  in 
art  assembly  language,  not  a  hlgn  level 
language.  It  is  symptomatic  of  the  lack  of 
progress  in  tills  area  that  the  programs  which 
deal  with  i/o  are  still  constructed  In  the  most 
primitive  language.  High  level  languages  are 
considered  to  be  too  Inet ticiant.  This  points 
out  tiie  lack  oi  a  unifying  structure  at  tf . 
input/output  interlace. 

we  wish,  to  investigate  Input/output  inter¬ 
action  at  the  actual  hardvere/software  inter¬ 
face.  Previous  work  (1,21  has  emphasized  the 
notion  of  a  device  as  an  asynchronous  process. 
This  is  appropriate,  ainen  synchronization  is 
an  important  issue  in  dealing  with  peripheral 
devices.  This  paper,  though,  deals  with  the 
input/output  system  at  a  different  level— the 
actual  hardware/software  interface,  the  two 
views  are  complementary  in  that  we  do  not  re¬ 
move  asynchronous  activity  from  the  i/o  area, 
but  rathtr  present  a  more  software  compatible 
view  of  the  i/o  interface  for  the  device 
processes  to  deal  with. 

3.G  Current  Practice 

Given  that  a  particular  place  of  equipment 
is  to  be  connected  to  a  computer  system,  typi¬ 
cally  a  hardware  designer  steps  in  and  designs 
a  controller.  The  hardware  designer  ia  given 
the  device  input-output  characteristics;  these 
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may  involve  a  fairly  large  number  of  analog 
and/or  digital  lines  subject  to  varying  electr* 
ical,  physical,  and  logical  constraints,  The 
product  of  the  hardware  designer's  labors  is 
the  logical  device  visible  to  trie  programmer  as 
a  set  of  io  ports  or  memory  registers.  Then  a 
prototype  is  built  and  the  nardware  debugged. 

wow  a  programmer  enters  tlie  scene  and  ae* 
signs  a  device  driver  (or  handler)  to  connect 
the  logical  device  to  the  operating  system 
(and,  in  turn,  to  application  level  programs). 
The  starting  point  for  the  programmer  is  the 
logical  device  constructed  by  the  hardware  der 
signer.  Tlie  logical  device  appears  as  a  coir 
lection  of  bits  which  represent  status  or  com» 
mands  and  a  data  register  for  data  or  adr 
dresses.  The  lines  to  the  device  which  had  a 
very  distinct  identity  to  the  hardware  designer 
^have  become  a  homogenous,  somevhat  anonymous 
\  of  bits  to  the  programmer.  In  the  case  of 

f  the  status  and  control  bits,  they  may  be  mixed 

» 

1  together  (note  that  status  bits  ere  to  be  read, 
’<  and  command  bits  are  to  be  written) ,  and  inciv 
Mentally  grouped.  The  problem,  though,  is  that 
,  t  prog  r easier  tends  to  view  the  device 
onmdiMnsionally.  All  status  bits  or  all  corn* 
mand  bits  are  viewed  as  being  equally  important 
on  the  same  level.  But  all  bits  are  not  equal* 
ly  important?  the  hardware  designer  under* 
stands  this  and,  for  exanple,  will  not  allow 
the  controller  to  function  if  the  device  is  not 
initialised.  This  onewdimensional  view  leads 
the  programmer  to  checK  the  status  of  the  dev* 
ice  thru  such  code  sequences  ass 


begin 

if  statusbit'  ! 

*  on 

then 

if  statusbltO.? 

*  on 

then 

»  t 

if  statusbit(p) 

•  on 

then 

end  ? 

or 

begin 

it  statusoit(m)  *  on  then  ... 

elseit  statosuit(n)  =  on  then  ... 

e  e  e 

elseit  statusbit(p)  *  on  then  ... 

end 

or  some  combination  ot  the  two.  In  the  first 
case,  the  nunber  of  possible  paths  thru  the 
code  for  n  status  bits  is  2**n.  Note  that  in 
many  device  interfaces  the  importance  of  the 
status  bits  is  not  at  all  apparent  *  that  is, 
there  is  no  simple  way  to  determine  the  impor* 
tance  of  the  status  bits.  The  prog ranmer  must 
check  bit  5,  then  bit  7,  then  bit  3,  or  check 
different  sequences  of  bits  according  to  wheth* 
ar  a  bit  is  on  or  not.  Actually,  the  situation 
is  even  worse?  complex  devices,  such  as  com* 
munlcatlun  drivers,  can  require  different  beha* 
vior  to  the  same  status  depending  upon  the 
prior  history  of  the  device. 

.  Finally,  the  programmer  has  a  driver  de* 
sign,  ho  codes  it  and  must  debug  it.  This  can 
be  3  harrowing  experience,  for  the  programmer 
is  confronted  with  a  new  piece  of  hardware 
which  may  maltunction,  or  he  may  have  misunder* 
stood  just  tiow  the  controller  works,  or  the 
nardware  designer  may  have  given  him  a  con* 
troller  which  is  difficult  or  unwieldy  to  deal 
with,  or  his  code  iciey  be  incorrect,  or  ***  (tlie 
reader  is  invited  to  fill  in  other  reasons) . 
The  driver  may  not  work  and  he  can't  tell  it 
the  problem  is  hardware  or  his  software,  bo  he 
calls  in  the  hardware  designer  to  tielp  him. 
But  now,  communication  between  the  two  may  cook 
pound  diliiculties. 
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•mis  paper  will  attempt  to  solve  these 
problems.  a  finite  state  machine  ifSM)  model 
will  be  presented  which  is  suitable  tor  imple- 
mentation  in  hardware,  software,  and  firmware. 
The  f'SM  has  several  desirable  properties  which 
make  it  attractive  ar,  a  hardware/  software  im- 
pleuienation  vehicle.  The  designer  is  forced  to 
explicity  account  for  ail  situations  wnich  may 
arise.  It  is  sufficiently  high  level  to  ssrve 
as  a  common  design  language  while  hiding  low 
level  implementation  details.  It  is  amenable 
to  automatic  manipulation  using  well-known  air 
gorltnms  (3J.  Given  suitable  restrictions  on 
the  model  >'i.e.,  hierarchical  structure),  and 
forcing  the  interface  registers  (the  logical 
device.!  to  conform  to  a  certain  standard  format 
which  is  particularly  economical  (in  Hardware) 
and  efficient  (in  software)  reduces  ttie  nuuber 
of  states  to  a  manageable  set.  In  fact,  ex¬ 
haustive  testing  may  become  feasible. 
Automatic  generation  of  tests  is  also  possible 
[4,b,6J.  Lastly,  ttie  PSM  collects  together 
sufficient  information  to  provide  a  history  of 
operation  which  can  be  useful  for  checkout, 
testing,  and  performance  evaluation. 

4.lo  A  General  Model 

In  designing  an  I/O  subsystem,  both  data 
and  control  must  be  considered.  At  the  operate 
ing  system  interface,  control  is  simple  and  the 
data  complex;  at  the  device,  the  data  is  sim¬ 
ple  and  the  control  complex,  for  example,  an 
array  (butfer)  ot  words  is  presented  to  ttie  I/U 
subsystem  with  a  reguest  tor  transfer,  'the  T/u 
subsystem  attempts  the  transfer  and  replies 
with  either  success  or  failure.  At  ttie  lowest 
level,  though,  single  words  might  be  trans¬ 
ferred  one  at  a  time,  with  an  acknowledgement 
after  each  transfer.  An  error  will  cause  re* 
tries  or  a  failure  status  to  be  returnee. 

'hie  general  model,  tlien,  is  heirarchically 
structured.  Lacli  level  translates  a  single 
command  into  a  set  of 


commands  to  a  lower  level  (the  control  consi¬ 
deration) .  Also,  a  data  type  is  translated 
into  a  different,  more  detailed  data  type  for 
the  next  lower  level,  bach  luvel,  then,  succe* 
sively  refines  both  control  and  data  to  a  more 
detailed  form.  Adjacent  levels  share  common 
control  and  data  structures.  The  next  section 
will  present  a  more  specific  model  for  the  re¬ 
alization  of  the  I/O  subsystem. 

'j.v  finite  state  Machines 

iiie  reader  is  assumed  to  be  familiar  with 
ttie  concept  ot  a  finite  State  Machine  (PSM) 
[7).  f'Srts  will  be  briefly  defined  in  order  to 
present  notation,  tse  shall  deal  with  the  Mealy 
model  of  an  tSM  as  it  seems  to  offer  technical 
simplifications  for  our  purposes. 

A  finite  State  Machine  is  defined  as  a  sextuple 
<  S,  I,  U,  NSF,  UP,  Bit  > 

wherv 


b 

- 

a  set  ot  states 

r 

A 

- 

a  set  ot  inputs 

0 

- 

a  set  ot  outputs 

MSP 

- 

a  next-state  function 

IlSP  t  Sxl  *>  s 

USf 

— 

an  output  function 

Of  S  sxl  ->  0 

sk) 

- 

the  initial  state 

where  there  can  be  no  confusion,  we  may 
omit  explicitly  listing  the  various  sets.  We 
will  rely  on  context  to  implicitly  define  them 
by  giving  the  Next  State  and  Output  functions 
either  in  tabular  form  as  in  figure  la  or 
in  graphical  fonn  as  in  figure  lb. 

/vi  fSM  operates  as  iollowB:  It  begins  op¬ 
eration  in  its  initial  state.  Heceiving  an 
input,  it  performs  some  output  dependent  on  its 


etat a  and  input.  Tuan  it  novas  to  anothar 
atata,  again  according  to  its  current  state  and 
input.  Th#  procass  repeats  continuously. 
Figura  2.  shows  a  skeleton  program  which 
Implements  this  procass. 

6.0  rteirarchical  Finite  State  Machines 


transition  back  to  its  initial  state,  the  oper* 
ation  of  the  submachine  ceases,  and  the  next 
higher  leval  machine  (vdiich  invoked  the  aubma* 
chine)  resumes  operation.  Note  that  since  the 
submachine  initiates  and  terminates  activity  in 
the  same  state,  it  has  no  memory  of  previous 
incarnations. 


A  Heirarchical  Finite  State  Machine  (FiFSM) 
is  a  set  of  machines  M[i]  ,  i>0,  such  that 

M[J]  is  an  FSM  s 

<S[i),  Ilil  ,  0[i),  N6F[i),  Of  til  <  S4»U1  > 
augmented  by 

<  EStil,  ILf U1  > 
where 

ES  is  contained  in  S[i] 

ILF [1)  s  fcSUl  *>  Mljl  where  j>i  . 


we  mention  in  passing  that  an  HFSM  is  ext 
actly  equivalent  to  a  much  more  complex  FSM. 
Thus,  an  HF'HM  has  no  greatsr  theoretical  power 
than  an  FSM.  Practically,  though,  it  has  sev* 
eral  advantages: 

1.  Heirarchical  structure  which  may  be 
designed  and  implemented  in  a  top-down 
fashion. 


As  is  often  the  case  with 
automata* theoretic  definitions,  the  formalism 
i!  appears  complex,  yet  the  operation  of  the  de* 
>■'  fined  machine  is  simple. 

Intuitively,  an  HFHf:  Is  a  collection  of 
FSMs  with  a  mapping  between  the  states  of  a  me* 
chine  at  one  level  and  the  machines  of  the  next 
lower  level.  That  is,  a  state  of  a  machine  at 
leval  i  may  be  associated  (by  an  Inter*Level 
Function  ILF)  with  a  machine  at  level  l+l .  Not 
all  states  need  be  mapped  to  a  lower  level  me* 
chine:  The  states  that  are  ao  mapped  are 


2.  A  clear  separation  of  concerns 
(inputs*outputs)  at  each  level. 

3.  An  HFSM  may  be  implemented  with  less 
memory  than  the  equivalent  FSM,  since 
the  HFSM  is  a  collection  of  small  FSMs 
rather  than  large  FSM.  The  nextstate 
and  output  functions  grow  as  the  pro* 
duct  of  states  and  inputs,  and  a  sin* 
gle  FSM  may  require  a  large  amount  of 
memory  to  represent  these  functions. 

6.1  Inputs 


termed  explosive  (the  set  ES  in  the  above  de« 
finltion).  Figures  3,4  illustrate  the 
structure  of  a  simple  HFSM.  The  HFSM  operates 
similarly  to  en  FSM  with  one  exception:  when 
an  explosive  state  is  reached,  the  execution  of 
the  HFSM  st  that  level  is  suspended,  and  the 
submachine  corresponding  to  the  explosive  stats 
is  activated.  The  submachine  starts  in  its  in* 
itlal  stats,  end  execution  commences  around  its 
state  transition  graph.  The  submachine  may,  in 
turn,  contain  explosive  states,  in  which  cose  a 
subvsubmachlns  is  recursively  activated,  and  ao 
forth.  When  the  submachine  finally  makes  the 


Inputs  are  usually  specified  in  simple  ex* 
ample s  as  single  symbols,  for  example,  '0‘  or 
*1'.  Implicitly,  we  mean  two  distinct  events: 
first,  that  an  input  is  present,  and  second, 
that  the  input  has  some  given  value,  we  wish 
to  deal  with  asynchronous  systems,  so  input 
evaluation  does  not  occur  until  an  input  Is 
prssont. 


For  certain  systams,  a  single  input  symbol 
may  not  be  sufficient.  In  that  case,  an  input 
can  be  considered  to  be  a  condition  which  is  to 
be  evaluated  as  true  or  false.  Only  one  input 


way  be  trim.  Tha  single  input  symbol  is  a  spe¬ 
cial  case;  it  is  siaply  the  condition  input  * 
symbol  . 

6.2  HfSrt  Oats 

the  previous  section  presented  tbs  flow  of 
control  of  an  ut'h M.  To  be  useful,  though,  it 
oust  be  possible  to  pass  data  through  the  Hk'SM. 
Several  data  buffers  are  provided  to  each  ms* 
chinei  an  inputvoutput  pair  to  be  used  for 
communicating  with  the  nsxtehigher  level  (i.e. 
the  invoking)  Machine,  and  an  input vout  put  pair 
for  each  explosive  state  to  be  used  for  ccemun* 
lotting  with  aufaaeschines.  Note  that  because 
the  Ufa 4  is  a  strictly  secjuantial  machine,  one 
pair  of  data  buffers  eay  be  used  to  communlcett 
with  all  next  lower  level  Machines,  four  prim- 
itives  are  provided  for  utilizing  these 

KllffePNi 

1.  Head  from  Above  (HA)  -  Head  the  data 
out ter  containing  data  from  the  next 
higner  level  machine. 

2.  write  to  iU>ove  {*&)  -  Write  data  into 
the  data  butter  of  the  next  nigher 
level  machine, 

1.  Head  from  below  (Hd)  -  Head  the  data 
buffer  containing  data  written  by  the 
submachine  corresponding  to  the  last 
explosive  state. 

4.  write  to  below  (Wb)  -  write  to  the 
data  buffer  which  can  be  read  by  the 
submachine  corresponding  to  the  explo¬ 
sive  state  being  entered. 

Mote  that  tne  data  passed  by  the  write  to 
below  (We)  function  to  tlie  next  lower  level, 
and  received  there  by  tne  Head  from  Above  (ha) 
function,  must  agree  in  type.  Similarly,  the 
Heceive  from  below  (Hb)  end  write  to  Above  (wa) 
functions  must  agree  in  type. 


7.S  Hardware  Implementation 

The  implementation  of  an  HfSM  in  hardware 
is  fairly  straightforward.  It  is  similar  in 
operation  to  the  ao  ft  were  version  presented 
earlier,  however,  certain  additions  are  mads 
in  order  to  facilitate  testing  and  to  aeconow 
data  the  lower  bandwidth  communication  channel 
between  the  device  controller  and  main  memory, 
tech  machine  may  be  implemented  in  its  most 
convlsnient  foremost  likely  ss  e  micro  prow 
g  rammed  controller  [8).  In  this  correction  not* 
that  the  control  of  all  machines  is  identical. 

8a ch  machine  must  provide  to  the  software 
driving  it  the  information  listed  in  figure  XX. 
All  fields  ere  encoded  as  small  integers  so 
that  slmpla  indexed  table  lookwups  and  CASE 
statements  may  be  used  to  access  the  HfSrt.  the 
machine  10  field  identifies  the  submachine. 
The  state,  input,  and  output  fields  describe 
the  machine  state  and  its  environment.  So  far, 
the  hardware  implementation  is  exactly  the  same 
as  the  software  version.  One  extra  item  is 
added  to  the  tiardware  version:  the  machine  10 
interrupt  level,  this  is  a  register  loaded  by 

ttie  software  driver  at  initialization  time 
which  specifies  which  machine's  state  transi¬ 
tions  cause  interrupts  (or  equivalently,  when 
software  interaction  is  needed) . 

Transitions  of  machines  which  have  IDs  not 
equal  to  tiie  machine  Xu  interrupt  level  proceed 
at  their  own  rate,  the  machine  specified  in 
this  register  is  the  highest  level  machine  in 
the  hardware.  This  is  the  hardware  macine 
which  interacts  with  the  software  machine. 
The  lower  level  machines  are  simple,  execute 
quickly,  and  do  not  require  software  interven¬ 
tion. 


•-*1 
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8.0  Testing 

Testing  tne  hardware  portion  ot  the  Hfc'SM 
is  maae  possible  Dy  tne  variable 
hardware/software  interface,  the  machine  lb  re- 
gister.  In  the  event  of  a  hardware  failure, 
indicated  by  illegal  state  transitions  and  the 
like,  tne  software  can  test  the  hardware  por¬ 
tion  of  the  Hk'brt.  The  test  portion  of  the 
software  contains  a  duplicate  implementation  of 
file  lower  levels  of  tlie  hardware  machines.  Of 
course,  this  duplicate  is  not  used  for  normal 
operation,  but  only  for  testing.  Tne  software 
sets  the  machine  ID  interrrupt  level  to  the 
next  lower  level  machine  and  executes  a  prede¬ 
fined  test.  It  compares  the  execution  of  the 
hardware  machine  to  its  own  simulation  and  re¬ 
cords  the  differences.  Tnese  differences  lo¬ 
cate  tlie  laulty  state  transitions.  If  there 
are  no  discrepancies,  then  It  repeates  tlie  pro¬ 
cess  on  the  next  lower  level  niachine.  It  con¬ 
tinues  checking  lower  level  machines  until 
faulty  transitions  are  isolated. 

9.8  Unresolved  Issues 


countered  in  practice,  of  reasonable 
computational  expense.  In  passing,  we 
note  that  the  tarts  which  we  have  used 
are  fairly  small  (8-18  states  and  a 
like  nuuber  of  inputs) .  Keferences 
[11,12J  suggest  a  connection  with  syn¬ 
chronization  using  regular  path  ex¬ 
pressions. 

2.  Certain  types  of  exception  conditions 
are  not  cleanly  handled.  Asynchronous 
exceptions  arising  from  an  external 
source  do  not  fit  the  model  well  as 
they  are  not  the  response  to  some  ac¬ 
tion  and  may  occur  in  the  middle  of 
some  conceptually  indivisible  action. 
An  example  ot  this  type  of  condition 
is  a  power-failure  indication.  It  is 
not  possible  to  guarantee  tnat  the 
(lower  fail  occurs  only  when  tne  ma¬ 
chine  is  in  certain  states  at  some 
given  level.  Une  might  simply  include 
a  power-fail  transition  in  every  state 
for  every  machine,  but  this  is  an  un¬ 
satisfactory  solution. 


Wtile  rtt'SMs  are  attractive  as  a  means  ot 
structuring  liardwace/software,  there  are  sever¬ 
al  areas  of  conventional  usage  which  do  fit 
well  into  the  model. 

1.  Concurrent  activity  cannot  be  ex¬ 

pressed  within  the  model.  An  tbrt  can¬ 
not  represent  concurrent  threads  ot 
control.  More  general  models,  such  as 
Petri  nets,  can  represent  concurrent 
activity  [9]  and  nave  been  used  as 
hardware/software  models  [18).  nut 

tnese  models  seem  t<  lose  some  ot  tne 
essential  simplicity  ot  the  hart  model, 
tuthermore,  many  ot  the  interesting 
properties  of  these  models  are  either 
undecldable  or  computationally  expen¬ 
sive.  In  contrast,  fc'Srt  guestlons  are 
all  decidable,  and  for  most  tarts  en- 


lw.u  Conclusions 

'hie  input/output  interface  has 
traditionally  oeen  a  source  of  trouble 
in  computer  systems.  Reasons  for  this 
include  a  lack  ol  cuiwunicutiou 
between  hardware  and  software  de¬ 
signers,  lack  of  a  unifying  framework 
for  hardware  and  software  specifica¬ 
tion,  and  an  inability  to  completely 
test  nardware/software  interlaces  re¬ 
alistically  due  to  the  large  nunuer  of 
states  involved.  The  problem  is  par¬ 
ticularly  apparent  in  tne  programs 
which  mate  a  piece  ot  hardware  (tot 
example,  a  perijiiaral  controller)  to 
an  operating  system. 
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procedure  fsm  ; 
type 

nextatatetype  *  array [1 . . 2, 1. . 2]  of  integer  ; 
outputtype  »  array[1..2,1..2]  of  integer  } 
const 

nextatate  ■  nextatatetype  ((  1,  2  ) 

(  1,  2  ))| 

output  *  outputtype  ((  1,  2  ) 

(  2,  2  ))» 

var 

currentatate  t  integer  j 
currentlnput  s  integer  i 

procedure  getlnput  (var  lnp  :  integer)} 
begin  {  getlnput  ) 

e  •  • 

end  {  getlnput  ]} 

begin  (  fan  ) 
repeat 

getlnput  (currentlnput)} 
caae  output (cur en tat ate ,  currentlnput]  of 
It  ... 

2  s  ... 
ns  ... 
end  } 

currentatate  t*  nextatate (currentatate,  currentlnput]} 
until  forever  t 
end  (  fan  }  » 


figure  2 .  PSrt  Skeleton  Program. 


procedure  hfan  > 
type 

hfatos  -  record 

lnitlalatate  s  Integer  } 
currentatete, 

currentlnput  s  Integer  ■, 
nextatnte  s  array [Inputs, atatea]  of  Integer  } 
output  s  arraylinputa,atatea]  of  integer  } 
eimloaive  s  array (statue]  of  boolean  ; 
submachines  s  array [atatea]  of  integer} 
end  } 

begin 

currentatate  s »  initialstate} 

repeat 

getlnput (currentlnput) i 

caae  output[currentst*te, currentlnput)  of 
Is  ... 

2  s  ... 
ns... 
end  } 

currentatate  nextatate (currentlnput , cur rentatate] i 

If  exploalve(currentatate] 

then  hf sb( submachine (cur rent rtate] ) } 
until  currentatate»lhitialaiate  t 
end  } 


figure  3.  HPSM  Program  Skeleton 
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Abstract 

The  software  problem,  measured  in 
such  terms  as  the  high  cost  required  to 
develop,  test,  debug,  and  maintain  pro¬ 
grams,  and  the  high  degrees  of  com¬ 
plexity  and  unreliability  in  programs, 
is  now  the  major  obstacle  to  computing, 
from  microprocessor  applications  to 
large-scale  systems.  One  partial  solu¬ 
tion  is  bringing  semiconductor  tech¬ 
nology,  in  the  form  of  improved  archi¬ 
tectures,  to  bear  on  the  problem.  In 
doing  so,  the  contention  is  that  machine 
architectures  should  not  be  oriented 
toward  Just  programming  languages,  but, 
more  importantly,  provide  mechanisms  on 
which  software  systems  concepts  can  be 
readily  based,  and  provide  a  more 
consistent  programming  environment. 

SWARD,  an  experimental  architecture, 

'  is  discussed  as  an  example  of  how  a 
machine  architecture  can  assist  in  the 
solution  of  the  software  problem. 

Introduction 

There  i3  widespread  agreement  that  the 
development  of  software  is  the  largest 
problem  in  the  computer  field  today.  The 
problem  Is  manifested  in  the  following 
ways.  First,  the  production  of  software  is 
a  costly  venture.  The  great  leaps  forward 
in  the  cost  of  digital  hardware  have  not 
been  experienced  in  software  development. 
Where,  in  the  past,  the  software  cost  of  a 
computing  system  was  outweighed  by  hardware 
costs,  the  opposite  is  the  case  today.  For 
instance,  the  cost  of  producing  a  single 
Instruction  in  a  program  for  a  micro¬ 
processor  system  probably  exceeds  the  cost 
of  the  processor. 

Second,  in  typical  software-develop¬ 
ment  projects,  more  than  50#  of  the 
development  costs  are  expended  in  the 
testing  and  debugging  processes.  Further- 


per  20  statements,  and  worse,  have  been 
reported  in  the  literature.  Hence,  a 
program  of  significant  size,  such  as 
100,000  statements,  might  initially  con¬ 
tain  5000  errors  prior  to  inspections  and 
testing. 

Finally,  because  of  the  Increasing 
sophistication  of  computer  applications, 
software  errors  can  have  rather  serious 
consequences. 

These  problems  will  be  exacerbated  in 
the  future  by  the  Increasing  sophistica¬ 
tion  of  new  computer  applications  in  such 
areas  as  artificial  Intelligence,  defense 
systems,  transportation  and  energy 
management,  and  electronic  fund  transfer. 

Software  engineers  and  computer 
scientists  have  been  wrestling  with  the 
software  problem  for  the  le«<t  decade. 
Although  improvements  have  been  made  in 
some  environments  and  organizations,  the 
problem  iB  still  a  serious  one.  One 
reason  is  the  recent  explosion  of  the 
amount  and  types  of  programs  being  pro¬ 
duced.  Ten  years  ago,  the  typical 
programmer  could  be  found  producing  a 
simple  Cobol  application  or  developing  an 
operating  system  for  a  computer  manu¬ 
facturer.  Today  we  find  a  much  larger 
programmer  population  developing  such 
applications  as  chess-playing  programs  for 
consumer  games,  fuel/air  mixture  regula¬ 
tors  in  automotive  microprocessors,  coro¬ 
nary-analysis  programs  in  medical  equip¬ 
ment,  collision-avoidance  algorithms  in 
airoraft  systems,  guidance  programs  in 
nuolear  missile  warheads,  and  dispatching 
systems  for  police  and  fire  equipment. 
Another  reason  is  that  the  largest  areas 
of  software-engineering  research,  namely 
improvements  in  programming  languages  and 
mathematical  proofs  of  program  correct¬ 
ness,  have  not  yet  had  a  significant 
effect  on  the  software-development  process 
in  industry. 


designer  to  be  interested  in  doing  so. 
Given  the  continuing  reduction  in  hardware 
costs,  the  processor  manufacturer  must 
sell,  its  product  in  increasingly  larger 
volumes.  Doing  so  requires  increasingly 
larger  amounts  of  software,  and  requires 
movement  of  computer  technology  into  new 
application  areas.  The  rate  of  sale  of 
computer  hardware,  from  microprocessors  to 
lnrge-Monle  systems,  is  directly  related 
to  how  quickly  the  required  system  and 
application  software  support  can  be  pro¬ 
duced,  and  the  reliability  of  that  soft¬ 
ware  . 


An  Approach  to  the  Problem 

The  answer  to  how  hardware  technology 
might  help  alleviate  the  software  problem 
is  not  the  simplistic  approach  of  "moving 
software  to  silicon,"  since  there  is  no 
evidence  that  the  problems  mentioned  above 
wilJ  disappear  by  merely  shifting  respon¬ 
sibility  for  the  design  task  from  the  pro¬ 
grammer  to  the  circuit  or  logic  designer. 
Rather,  tine  answer  is  designing  machines 
that  provide  less-hostile  environments  for 
programs,  programmers,  and  end  users.  The 
architect  must  now  face  up  to  broader  con- 
si.  derations ,  such  r>” 

1.  Ways  in  which  the  architecture  can 
simplify  the  task  of  application  pro¬ 
gramming,  for  instance,  by  providing 
support  for  more-potent  concepts  of  input/ 
output  and  data  manipulation  in  pro¬ 
gramming  languages. 

r:„  Ways  in  which  the  architecture  can 
encourage  the  use  of  good  software  design 
and  programming  practices,  for  instance  by 
providing  efficient  support  for  concepts 
of  program  modularity,  information  hiding, 
abstract  data  types,  and  structured  pro¬ 
gramming.  The  motivation  here  and  in 
point  1  Is  the  prevention  of  programming 
errors. 


3.  Ways  in  which  the  architecture  can 
assist  the  coBtly  processes  of  software 
testing  and  debugging,  for  instance  by 
detecting  or  preventing  common  programming 
errors  and  by  providing  a  more-flexible 
base  for  the  development  of  software 
testing  and  debugging  tools. 

4,  lays.'  in  which  the  architecture  can 
reduce  the  complexity  of  one  of  the  most- 
complex  classes  of  software,  namely  com¬ 
pile  ra.  Such  support  Involves  reducing 
the  semantic  gap  bewteen  languages  and  the 
architecture  by  tailoring  the  operations 
ami  objects  provided  in  the  architecture 
more  clorjo.ly  to  the  corresponding  concepts 
:l n  programming  languages.^- 

D.  Ways  in  which  the  architec ture  can 
reduce  the  complexity  of  another  complex 


class  of  programs  -  operatin,,  ay.-xe  :,. 
This  might  imply  increased  awareness 
the  architecture  of  such  concepts  as 
protection,  process  management,  ,  ro'ts.1. 
synchronization  and  communicable-  .  ■ 

memory  management . 

Considerations  such  ar  4  ,  < 

been  addressed  in  the  1  Iterator  1  *' * 
have  had  little  impact,  ar.  e*. .  s- 

commercially  available  co>-;  ter  $ 

The  SWAKb  Architecture 


An  example  of  an  approach  tc  so .  c : r . 
the  software  problem  is  an  ev-perieenti. 
system  under  development  at  tive  l/.v 
Systems  Research  Institute,  .tier. — < 
current  definition  of  the  archite- :  ire  * • 
not  been  published,  it  has  evoly^d  rc*~ 
earlier  published  vers  ions. * *9, “ 

The  five  sets  of  conslderat  ivr 
listed  in  the  previous  section  :.re  t  . 
design  objectives  of  the  architecture. 
Detailed  objectives  were  derive!  for  car; 
of  the  categories.  Many  of  theac  o;  ec  - 
tives  are  mentioned  in  the  follow.:*  c;.- 
cussion  of  the  architecture. 

The  major  attributes  of  the  archi¬ 
tecture,  and  some  of  their  relationship: 
to  the  software  problem,  are  outlined 
below. 


Tagged  storage.  The  concept  of 
tagged,  or  self-identifying,  storage  is 
used  throughout  the  architecture  to  allow 
the  machine  to  understand  unambiguously 
the  attributes  of  the  operands  of  an 
instruction.  This  allows  the  machine  to 
detect  operations  on  incompatible 
operands  and  to  perform  automatic  data 
conversions  during  instruction  processing. 
Each  data  type  has  a  unique  representation 
for  the  "undefined"  state,  allowing  the 
machine  to  detect  attempts  to  use 
undefined  values. 

The  tagged  data  elements  (called 
cells)  are  variable  in  size.  The  archi¬ 
tecture  contains  no  /ixed-size  word 
concept  and  permits  machine  instructions 
to  address  only  cells  as  operands;  hence 
the  data  model  provided  by  the  architec¬ 
ture  closely  corresponds  to  the  data 
models  in  programming  languages. 

Nested  tags.  The  tagged  storage 
concept  was  extended  to  allow  tags  to  be 
embedded  within  other  tags,  allowing  the 
representation  of  higher-order  data  types 
as  arrays,  structures/records,  and  user- 
defined  types.  The  machine,  rather  than 
the  program,  handles  the  task  of  array 
addressing,  and  automatically  perform:; 
bounds  checks.  The  architecture  also 
contains  explicit  representations  of 
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arrays  of  structures/records  and  "based 
variables." 

Capability  based  addressing.  The 
architecture  employs  the  addressing  and 
protection  concept  of  capability-based 
addressing.  The  architecture  views  the 
world  as  a  set  of  objects,  each  being 
given  a  unique  name  by  the  machine  when 
created.  Programs  cannot  fabricate  or 
Manipulate  addresses,  and  any  reference  to 
an  object  after  the  object  has  been  des¬ 
troyed  results  in  a  detected  error. 

Capabilities  and  objects  are  used  to 
create  a  high-level  storage  model,  the 
elimination  of  traditional  low-level 
storage  concepte  being  another  objective 
of  the  architecture.  Figure  1  depicts  a 
possible  state  of  the  storage  model.  The 
architecture  recognlxee  five  types  of 
objects,  four  of  which  (module,  process 
Machine,  port,  data-storage  object)  are 
explicitly  created  and  addressed  by  pro¬ 
gress  and  one  of  which  (activation  record) 
ia  implicitly  created  via  a  module  invo¬ 
cation. 

Full  generality  of  allowing  capa¬ 
bilities  to  reside  in  objects  is  provided; 
capabilities  are  protected  by  their  being 
one  of  the  15  tagged  cell  (data)  types. 

As  ah own,  the  architecture  also  uses  capa¬ 
bilities  to  reference  source/sink  (storage* 
less)  1/0  devices. 


Figure  1 


Single  level  storage.  The  concept  of 
virtual  storage  has  been  generalized  to 
the  extent  that  there  is  no  notion,  above 
the  architecture,  of  secondary  storage. 

For  instance,  the  concept  of  files  no 
longer  exists;  programs  use  arrays  to 
represent  what  would  have  been  considered 
to  be  a  file.  Hence  the  concept  of 
secondary-storage  I/O  has  been  eliminated; 
all  data  in  the  system  are  addressed  in  a 
uniform  way,  and  all  other  concepts  in  the 
architecture  (e.g.,  tagged  storage)  apply 
to  all  data  in  a  uniform  manner. 

Within  the  environment,  all  concepts 
of  storage  allocation  have  been  removed 
from  the  domain  of  software.  Although 
storage  allocation  does  occur,  it  is  done 
implicitly  by  the  machine,  for  instance, 
as  an  effect  of  a  module  invocation 
(where  the  machine  oreates  an  activation 
record  for  the  module's  local  variables). 
Rather  than  being  able  to  allocate  space, 
programs  are  presented  with  a  function  to 
allocate  occurrences  of  cell  types,  such 
as  strings  and  array s~Tthe  dynamic 
allocation  of  which  is  embodied  in  a 
data-storage  object). 

Small  protection  domains.  Each  sub- 
rout  inS”oFproce3ure^7^irprogr am  is 
represented  by  a  module  object,  which 
contains  the  generated  instruction  stream 
and  a  definition  of  the  module's  address 
space  (a  set  of  tagged  cells).  ThiB 
structure  is  shown  in  Figure  2.  Instruc¬ 
tions  in  a  module  can  address  items  only 
within  the  private  address  space,  although 
well-controlled  indirect  references  can  be 
made,  via  parameters  and  capabilities, 
outside  of  the  address  space.  Thus  the 
architecture  enforces  rules  of  program 
modularity,  limits  the  consequences  of 
errors,  and  protects  a  program,  including 
the  system  software,  from  itself. 

Automatic  subroutine  management.  The 
architecture  removes  the  burden  of  subrou- 
time  management  from  the  shoulders  of  the 
compilers  by  containing  instructions  that 
perform  all  that  is  implied  by  a  subroutine 
call  in  a  high-level  language.  For 
instance,  the  CALL  instruction  saves  the 
state  of  the  current  module,  creates  and 
initializes  an  activation  record  for  the 
called  module,  switches  address  spaces, 
and  begins  execution  of  the  called  module. 
The  attributes  of  arguments  and  parameters 
are  verified  for  consistency  during  each 
call. 

Figure  2  shows  that  a  module's  add¬ 
ress  space  is  partitioned  into  two  sec¬ 
tions  -  the  "static  storage  die"  and 
"automatic  storage  die."  Cells  in  the 
static  storage  die  reside  permanently 
within  the  module  object.  When  a  module 
entry  point  is  called,  the  machine  creates 


Figure  2 

an  activation-record  object  containing  a 
copy  of  the  definition  of  the  cells  in  the 
automatic-storage-die  part  of  the  address 
space.  When  an  instruction  refers  to  a 
cell  in  the  automatic  storage  die,  the 
machine  automatically  maps  this  reference 
to  the  corresponding  cell  in  the  current 
activation  record. 

Hierarchical  fault-handling 
meohanisnu  The  architecture  contains  a 
uniform,  process-oriented,  rather  than 
system-oriented,  mechanism  for  the  hand¬ 
ling  of  error  conditions,  called  faults. 
Any  module  can  contain  a  special  fault¬ 
handling  entry  point  and  specify  which 
types  of  faults  can  be  handled  there. 

When  a  fault  is  detected  in  a  module,  the 
machine  searches  back  through  the  activa¬ 
tion  history  of  the  process,  looking  for 
the  first  module  that  has  indicated  a 
desire  to  handle  that  type  of  fault.  When 
one  is  found,  the  machine  "calls"  that 
entry  point  (i.e.,  simulates  a  subprogram 
call),  passing  It  five  arguments  describ¬ 
ing  the  fault  and  the  state  of  the  program 
at  the  time  of  the  fault.  What  happens 
after  that  is  a  function  of  the  fault- 
handling  software  in  the  module.  However, 
the  architecture  provides  several  instruc¬ 
tions  to  terminate  a  fault  handler  and  an 
instruction  to  explicitly  raise  fault 
conditions. 

process  machines.  One  of  the  five 


A  process-machine  object  has  the  character¬ 
istics  of  a  hardware  processor  and  thus 
creates  a  multiprocessor  environment;  how¬ 
ever,  the  mapping  of  process  machines  to 
hardware  processors  is  a  matter  of  hardware 
implementation,  not  architecture.  (At  one, 
extreme,  a  single  hardware  processor  can 
time-slice  itself  to  act  as  all  process 
machines. ) 

By  creating  and  destroying  process 
machines,  programs  create  and  destroy 
processes.  In  keeping  with  the  design 
rules  followed  throughout  the  architecture, 
this  entity  defines  only  a  mechanism,  out 
of  which  programs  can  create  policies. 

Also,  it  is  orthogonal  with  other  concepts 
i»  the  architecture  (e.g.,  process  machines 
have  no  relationship  to  addressing). 

Send/receive  mechanism.  Two  machine 
instructions,  SEffo  and  RECEIVE,  and  an 
abstract  object,  a  port,  are  provided  for 
interprocess  communication.  The  SEND 
instruction  is  defined  almost  identically 
to  the  CALL  instruction,  except  where  CALL 
transfers  control  and  a  set  of  arguments 
to  a  module  entry  point,  SEND  transfers  a 
set  of  argument  values  through  a  port. 

That  is,  it  transfers  data  but  not  control. 
As  with  the  subroutine  call  mechanism,  type 
checking  occurs  across  the  send/receive 
interface.  As  mentioned  earlier,  source/ 
sink  devices  are  represented  by  capabili¬ 
ties,  and  one  does  I/O  operations  on  these 
devices  by  use  of  SEND  and  RECEIVE. 

The  mechanism  iB  synchronous  to  the 
extent  that  a  process  machine  executing  a 
SEND  instruction  halts  until  another 
process  machine  receives  the  transmitted 
values.  Thus  the  mechanism  is  similar  to 
the  rendezvous  concept  in  the  Ada  lan¬ 
guage  , 

Generic  instructions.  The  concept  of 
lagged  storage  allows  the  architecture  to 
be  defined  with  a  small,  highly  regular, 
generic  instruction  set.  For  instance, 
there  is  only  a  single  instruction  for 
performing  addition  -  ADD  -  and  only  a 
single  instruction  for  transferring  values 
in  storage  -  MOVE.  The  semantics  of  the 
instructions  are  defined  by  the  attributes 
of  their  operands.  For  instance,  the  MOVE 
instruction  can  be  used  to  store  an 
integer  value  in  a  floating-point  data 
cell  (doing  an  automatic  data  conversion), 
store  one  character  string  in  another, 
store  a  scalar  value  into  all  elements  of 
an  array,  or  set  one  array  equal  to 
another.  One  of  the  benefits  of  this  is 
significant  simplification  of  compilers, 
particularly  the  code-generation  process. 


Instruction  to  address  and  move  sub¬ 
strings  within  strings,  a  search  instruc¬ 
tion  to  search  an  array  for  a  matching 
value,  and  an  iterate  instruction  embody¬ 
ing  the  full  semantics  of  iterative  DO 
loops  in  such  languages  as  Fortran  and 
PL/I, 

For  process  synchronization,  the 
architecture  contains  two  instructions 
named  GUARD  and  UNGUARD.  They  can  be 
used  to  prevent  simultaneous  execution  of 
two  or  more  processes  through  a  critical 
section  of  instructions  and  were  motivated 
by  the  software  design  and  synchronization 
concept  of  monitors. 12 

Transparent  indirect  addressing.  The 
concept  of  capabilities  has  been  expanded 
to  allow  capabilities  to  point  to  other 
capabilities  such  that,  if  a  program 
refers  to  a  capability,  the  machine  will 
Interpret  this  as  a  reference  to  the  last 
capability  in  the  chain.  This  concept 
can  be  used  for  added  levels  of  data 
security,  by  an  operating  system  for 
access  control  of  objects,  and  to  allow 
one  to  dynamically  replace  objects  (e.g., 
modules)  in  a  program  while  the  program 
is  executing. 

Program  tracing  facilities.  Instruc- 
tions  exist  to  activate  the  tracing  of 
branches  taken,  branches  not  taken,  and/ 
or  calls  in  specified  modules.  When  such 
events  occur,  they  are  treated  by  the 
machine  as  faults  and  thus  the  fault- 
handling  mechanism  mentioned  above 
applies. 

Additional  security  features.  In 
addition  to  the  protection  concepts  of 
capabilities,  small  protection  domains, 
and  indirect  capabilities,  the  architec¬ 
ture  contains  additional  security  fea¬ 
tures,  such  as  the  ability  of  a  program 
to  restrict  the  copying  of  capabilities, 
an  instruction  to  assign  a  new  unique 
name  to  an  object,  and  a  second  level  of 
protection  provided  by  the  use  of  tagged 
storage. 

Semantic  checking.  One  of  the  major 
objectives  of  the  architecture  is  detec¬ 
tion  of  large  classes  of  semantic  errors 
in  programs,  errors  that  are  (1)  frequent, 
(2)  difficult  to  debug  when  they  occur  in 
conventional  systems,  (3)  common  to  many 
or  all  programming  languages,  and  (4)  in 
general,  not  detectable  at  the  time  of 
program  compilation.  Examples  of  a  few 
of  the  27  classes  detected  are  (a)  use  of 
undefined  data  values,  (b)  references  to 
nonexistent  array  elements,  (c)  the 
dangling-reference  problem,  (d)  data  type 
ambiguities  (e.g.,  inconsistent  declara¬ 
tions  of  global  data),  and  (e)  mismatch¬ 
ing  arguments  and  parameters.  Studies 
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have  indicated  that  these  errors  represent 
30-50%  of  all  errors  in  typical  programs. 

Virtual  machine.  Although  not  an 
explicit  objective  of  the  architecture, 
attributes  of  the  architecture,  such  as 
capabilities  and  objects,  have  given  it 
the  characteristic  of  being  a  virtual- 
machine  environment,  meaning  that  programs 
can  exist  having  no  relationship  to  the 
operating  system,  and  multiple  operating- 
system  environments  can  coexist. 

Relevance  of  SWARD  to  the  Software  Problem 

The  SWARD  architecture  is  unique  in 
that  almost  every  aspect  of  the  architec¬ 
ture  was  motivated  by  a  desire  to  alle¬ 
viate  the  software  problem.  The  major 
ways  in  which  this  is  achieved  are  dis¬ 
cussed  below. 

The  extensive  semantic  checking  per¬ 
formed  by  the  machine  should  enhance  sig¬ 
nificantly  the  productivity  of  the  software 
testing  and  debugging  processes,  and  lessen 
the  consequences  of  errors  occurring  in 
production  programs. 

The  object  orientation  of  the  archi¬ 
tecture,  and  the  use  of  capability-based 
addressing,  presents  a  highly  uniform 
system  environment.  The  objects  of  the 
architecture  (modules,  process  machines, 
ports,  data-storage  objects),  as  well  as 
source/sink  I/O  devloes,  are  addressed  in 
an  identical  fashion.  This  has  important 
Implications  on  the  complexity  of  system 
software  and  the  user  environment.  For 
Instance,  where  conventional  systems  con¬ 
tain  a  variety  of  dissimilar  mechanisms 
for  the  binding  of  entities  (e.g.,  a 
"linkage  editor"  for  binding  program 
modules  together,  control-language  state¬ 
ments  and  "open"  services  for  binding 
programs  to  files),  an  operating  system 
can  be  defined  with  a  single  uniform 
concept  of  binding. 

The  single-level  store  concept, 
particularly  when  carried  forth  into 
programming  languages,  largely  eliminates 
the  need  for  I/O  concepts,  allowing  the 
programmer  to  think  of  data  in  a  uniform 
way. 

The  use  of  the  SEND  and  RECEIVE 
instructions  as  the  basic  I/O  primitives 
for  source/sink  devices,  as  well  as  for 
interprocess  communication,  has  several 
benefits.  First,  it  adds  another  measure 
of  uniformity  to  the  system,  since,  for 
instance,  there  is  no  difference  among 
sending  a  character  string  to  a  printer, 
terminal,  or  another  process  through  a 
port.  Hence  there  is  only  one  concept  of 
data  transmission.  Second,  it  allows  one 
to  substitute  processes  for  I/O  devices, 
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■  r  j/0  Jovlcea  l'or  procosse:;,  wi  Hi.mt 
.  I.ang t tij>  one's  program.  Third,  the  acrid/ 
tv  ce  l  v*’  mechanism  is  synchronous  with  res¬ 
pect  to  whatever  is  on  the  other  aide  (i/u 
device  or  process).  Hence  there  is  only 
une  concept  of  parallelism  in  the  system  - 
the  process.  There  is  no  concept  of  an 
interrupt . 

Other  unifying  ideas,  all  of  which 
:  erve  to  make  the  programming  environment 

•  less-complex  and  less-hostile  one,  are 
tno  fault-handling  mechanism  for  error 
handling,  capability-based  addressing  for 
information  sharing  and  protection,  the 
highly  generic  instruction  set,  and  no 
need  for  a  privileged  instruction  state. 

The  development  of  well-structured 
ograms,  employing  concepts  of  modularity, 
information  hiding,  and  parallel  processes, 
lr  encouraged  by  the  machine  concepts  of  an 

•  flicient  subroutine-management  mechanism, 
small  protection  domains,  the  fault-hand¬ 
ling  mechanism,  the  single-level  store,  the 
GUARD,  UtJGUARD,  SEND,  and  RECEIVE  instruc¬ 
tions,  and  others. 

'"he  points  above  apply  to  the  pro¬ 
gramming  environment  in  general ,  but 
several  additional  points  can  be  made  about 
compilers,  operating  systems,  and  data-base 
management.  Because  of  the  concepts  of 
tagged  storage,  direct  recognition  of 
higher-order  data  types  such  as  arrays  and 
structures,  the  generic  instruction  set, 
and  the  power  of  the  instruction  reper- 
lolre,  the  development  cost  and  complexity 
of  compilers  should  be  significantly 
reduced. 

For  many  of  the  same  reasons,  and 
because  of  other  facilities  in  the  machine, 
ihe  overhead  and  development  cost  of  high- 
level-language-oriented  testing  and 
debugging  tools  3hould  be  greatly  reduced. 

i ne  architecture  also  eliminates  much 
•  •f  (. In:  traditional  complexity  of  operating 
•t/stei.is  and  other  subsystems  by  removing 
l  rotri  them  the  problems  of  memory  manage¬ 
ment,  protection,  process  synchronization, 
interprocess  communication,  and  interrupts. 

The  use  of  generic  instructions  and 
tugged  storage  implies  the  latest-possible 
binding  of  instructions  and  data;  the 
semantics  of  an  instruction  are  determined 
at  the  time  of  Its  execution,  using  the 
information  in  the  tags  of  its  operand 
cells.  SWARD  extends  this  even  further  by 
allowing  the  programmer  to  incompletely 
specify  the  attributes  of  a  local  variable 
in  its  tag;  this  allows  a  local  variable  to 
acquire  dynamically  some  or  all  of  its 
attributes  (e.g.,  from  a  parameter).  These 
points  have  significance  to  the  concept  of 
data  independence  in  data  base  environ- 
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liven  the  magnitude  of  the  ..ortware 
problem  today  and  an  appreciation  for  how 
much  worse  it  will  be  tomorrow,  and  given 
the  rapid  advances  in  hardware  technology, 
the  time  seems  ripe  for  major  architecture 
redirections  that  make  fundamental  improve¬ 
ments  in  the  programming  environment .  The 
SWARD  architecture  serves  as  an  example  of 
how  a  machine  architecture  can  reduce 
software  complexity  and  lessen  the 
difficulty  and  error-proneness  of  program 
design,  coding,  testing,  and  debugging. 
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ABSTRACT 

Given  the  advances  of  technology,  it  ii  not  unreuonible  to  pro¬ 
ject  the  existence  of  multiple  proceteor  configuration!  that  have 
large  number*  of  proceieot*  with  a  variety  of  Interconnection 
poeiibUitic*. 

Thi*  paper  diacuases  language  conatruct*  tor  iaterproceu  commu¬ 
nication  and  proceaa  creation  function*  which  would  be  funda¬ 
mental  to  lyatcm*  that  run  eel*  of  program*  diaperaed  acrou 
famllki  of  logical  processor*.  Certain  divergence*  between  vari- 
ou*  concept*  of  interprooe**  communication  are  resolved  in  a 
finale  deiiin 

INTRODUCTION 

Recent  dramatic  development*  in  procebwr/ memory  technology 
and  in  interconnection  methodologie*  for  the  aaaociation  of  proc- 
e»*ori  with  each  other  auggeit  that  future  multiple  proceiaor 
configuration*  may  have  largo  number*  of  fait  and  cheap  proc- 
euon  with  a  variety  of  memory  during  possibilities!  1,2, 3, 8]. 

An  objective  of  *uch  multiple  prooeaaor  *y*tem*  will  be  the  need 
to  quickly  and  dynamically  react  to  the  changing  demand*  on  the 
•yitem.  Thi*  will  imply  the  need  to  not  only  group  a  *et  of  proc- 
e*«or*  to  work  on  a  given  act  of  application*  but  will  alao  imply 
the  need  to  dynamically  partition  memory  apace*  which  arc  phyri- 
cally  common  amongst  theae  *et  of  procemor*.  For  convention 
we  will  refer  to  a  aat  or  procemon  and  memory  formed  dynami¬ 
cally  a*  a  logical  sytttm. 

In  luch  lyitenu  lub-configurationi  of  cloeely  cooperating  generic 
multiproceiaor*  may  be  formed  and  partitioned  icti  of 
'dlitriboted'  configure tioe*  may  be  formed  between  unit*  with  a 
rich  divinity  of  decisions  about  memory  during,  code  replication, 
etc.  The  intent  of  th*  concept,  of  couna,  I*  to  allow  lyttemi  to 
take  dupe*  appropriate  tor  their  application*.  To  (upport  thi* 
notion  variou*  tpeed*  of  memory  and  procemor*  would  be  availa¬ 
ble  *o  that  variou*  oonoepta  of  application  tpeed  and  partitioning 
can  be  mipported  by  dacWoo*  about  procemor  and  memory  tpeed 
and  capacity. 

Some  Important  concept*  of  dynamic  configurability  dtould  edit 
in  the  ayttem.  Set*  of  dotely  cooperating,  memory-ahared  proc- 
euor*  thould  be  dynamically  definable  for  ihott-periodt,  co¬ 
operating  or  indapaadant  aab-couflgaration*  of  "distributed" 
ay*tem*  ahould  alao  be  definable  tor  brief  period*.  When  dedra- 
bte,  permanent  "gang*"  of  amo dated  procemor*  at  different  level* 
of  memory  and  operating  lyatam  during  dtould  alao  be  definable 
within  the  total  population  of  procemor*,  memorie*  and  other 
reaourcc*  of  th*  ayttem.  A  goal  of  inch  a  tyitem  I*  to  nuke  maxi¬ 
mum  urn  of  the  well  known  oonoapt  that  logical  ayttem*  itructurei 
of  varying  kind*  of  'aiadoaabip*  and  doeenam  of  cooperation  can 
be  maooed  onto  ohvdcal  atnacture*. 


DESIGN  CONCEPTS 

A  very  well  known  way  of  itnicturtag  an  operating  *y*t*m  la  to 
define  vertical  partltiooi  of  function*  mob  that  that*  I*  a  func¬ 
tional  module  for  I/O,  memory  management,  pro  earn  communica¬ 
tion*,  proem*  tynchronlxation,  etc.  Th*  grant  advantage  to  the 
•true hire,  of  courm,  I*  that  it  allow*  multiple  partUd  mrviee*  to 
be  achieved  in  mahipie  procemor  environment*. 

The  ttructure  can  be  aupported  by  hardware  la  a  number  of  way*. 
Each  functional  module  can  be  Located  in  protected  addrem 
apace*  la  a  large  aingU  pfaydeal  proeomor/memory.  An  interacting 
attribute  of  a  capability,  object  mirage  areal  architecture  rack  a* 
SWARD{6]  I*  that  the  phydeal  configuration  of  memory  i*  logi¬ 
cally  irrelevant.  Configuration*  can  be  formed  with  variou*  de¬ 
gree*  of  (bated  or  private  phydeal  memory  without  the 

logic  of  the  object  managamant  lyrtm*. 

The  ability  to  aarign  *ome  number  of  prGCamon  of  any  architec¬ 
ture  to  a  ijrttem  mggmta  that  them  prooeaaor*  way  bo  need  aa 
Global  Service  protwmon,  each  aadgaed  to  a  dgulflsant  operating 
•yttem  f unction  of  th*  type  nggaeted  above.  There  may  be  a 
Syatemi  Wide  Meaaag*  Handler,  a  Syitama  Wide  Global  Schedu¬ 
ler,  A  Syitem*  Wide  I/O  aerver,  etc. 

In  en  alternative  itructure,  each  pioneering  node  could  bo  com- 
poeed  of  two  proooaring  nodm.  Conceptually  one  might  think  of  a 
Problem  State  element  and  a  Supervisor  State  element.  All  those 
activities  which  would  be  executed  la  rapervieor  state  la  S/370 
architecture  would  be  executed  In  owe  element,  while  an  those  In 
problem  atete  in  another  element.  Although  this  serves  aa  a  con¬ 
ceptual  example,  It  la  not  clear  that  this  particular  partitioning  of 
function  between  alemtnu  of  n  node  1*  the  proper  partitioning 
point.  The  discovery  of  a  proper  partitioning  between  computa¬ 
tion*!  element  tad  operating  ayttem  element  depends  upon  e 
number  of  factor*  which  include  frequency  of  function,  instruction 
•at  reitrictloni,  the  degree  of  eiynchroniety,  etc.  A  fell  beck 
concept  I*  to  view  the  operating  ayttem  element  as  a  kind  of 
network  proceteor  which  become*  Involved  only  when  th*  aaaod- 
■ted  computational  procemor  dsoae  a  request  which  will  Involve 

interaction  with  another  nation  in  the  network.  This  may  be 
railing  back  too  far  dace  it  place*  In  the  computational  procemor 
the  burden  of  determining  when  aa  off-iterion  ref  anno*  matt  be 
made  and  this  effort  may  be  large  compared  to  making  the  hater- 
action  Itself.  It  I*  preferable  for  the  operating  system  pro  earner  to 
determine  wbet  and  when  off -station  reftrenoet  most  be  mad* 
while  the  computational  processor  proceed*  with  other  available 
work. 
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la  such  s  system,  whin  each  node  ii  comprised  of  it  leut  sn 
operating  ijittn  processor  end  *  computational  processor,  itch 
operating  system  element  bu  ■  functionally  equivalent  local  oper¬ 
ating  item  Uut  participates  in  global  system  decisions  ud  glob- 
«1  system  services  m  weil  u  providing  local  support.  This  consti¬ 
tute*  the  bests  for  e  completely  distributed  control  system  in 
which  intensive  interaction  is  snstsined  between  station*  ud 
where  negotiation  ud  co-operatioa  lend  to  system  wide  deddoni 
shout  work  dtotrtbwtioo.  A  beet  proceeeor  for  e  unit  of  work  may 
be  discovered  by  interaction  and  negotiation  with  operating  sys¬ 
tem  elements  ewers  of  what  their  associated  computational  proc¬ 
essor  is  doing.  This  negotiation  goat  on  without  disturbing  the 
progress  of  avnllabla  work  on  s  local  dispatch  list.  Tbs  following 
sections  address  them  design  objectives  with  respect  to  tbe  lan¬ 
guage  constructs  ud  oontrol  structures  for  process  creation  ud 
Inter-proeees  communication. 

TERMINOLOGY 

in  the  system  we  ere  about  to  describe  we  introduce  the  following 
terminology: 

PROCESS  CONTROL  TERMINOLOGY 

APPLICATION  .  An  application  defines  s  contest  by 
indicating  a  set  of  procedure  tad  data  objects  that  may 
be  secerned  and  states  tbe  rules  of  refer* nee.  The  appli - 
calkm  describee  both  the  physical  and  abstract  resource 
constraint!  necessary  end  permissible  for  processes  be¬ 
longing  to  tbe  application. 

PPOCEDUPB  -  A  precede*  is  program  tan  and  capabil¬ 
ities  for  u  incarnation  of  a  process  or  an  instance  of 
activation.  It*  unique  feature  in  this  system  is  a  statement 
of  onneunnhl*  resource  onnerraints  in  addition  to  tbe  Ust 
of  abstract  ranowoes  (such  as  flies,  data  beat*,  locks,  etc) 
necessary  tot  successful  processiag, 

PROCESS  -  The  System  dispa tchsbia  unit.  A  stack  of 
activation  record*,  each  associated  with  a  procedure, 
resting  upon  a  proem  activation  block  that  may  be  used 
for  rooovety.  A  proem  I ■  named. 

USEE  •  Bosh  mar  at  the  system  is,  of  course,  defined  to 
the  system.  Put  of  tUs  definition  is  a  Hat  of  the  total  set 
of  sppUcatioao  which  the  umr  can  connect  to. 

PROCESS  COMMUNICATION  TERMINOLOGY 


POKT  -  A  port  may  be  a  to  port  or  a  from  port.  On  a 

■eodar’f  tide  a  from^jpon  is  a  named  place  la  a  saader’s 
program  (e.g.  a  declared  structure  in  the  PL/I  sense) 

.  wyin*  u  the  source  of  s  massage  to  be  sent,  A  lo  port 
is  the  naase  of  «  reoeMag  proem  On  s  receiver's  aide  a 
f e—Pmit  a  named  piece  in  the  receiver's  program  where 
the  rnssma*  wM  be  placed.  A  from  port  U  the  n— ~  of 
s  sending  precast. 

PATH  -  A  path  can  be  either  a  queue  name  or  a  file 
name.  The  path  represents  an  indirection  from  the  sender 
to  either  a  spetafic  receiver  or  to  u  arbitrary  receiver. 

The  exact  details  of  Inter-pro  earn  communication  will  be  del erred 
till  the  section  on  process  communication. 


PROCESS  CREATION 

One  of  the  major  objectives  in  introducing  new  language  con¬ 
structs  is  to  insure  that  in  so  tu  m  is  reasonable  the  |««g"«g-  for 
application  programming  is  the  same  u  tbe  aser's  command  lan¬ 
guage.  Not  having  this  as  an  objective  results  in  increased  com¬ 
plexity  in  requiring  s  user  to  learn  more  than  one  i»»gu»g«  for 
perforating  the  same  Identical  function  Por  simridty,  PASCAL 
t*  need  at  the  language  for  syntax  expression  In  this  section  ud  in 


tbe  next  section  (7.9J,  though  the  language  constructs  presented 
are  not  unique  to  PASCAL. 

In  tbe  system  we  uc  presenting,  there  ere  users  who  initiate 
processes  (which  can  Initiate  still  store)  which  run  under  e  given 
application  scope.  Consequently,  the  following  declarative  struc¬ 
tures: 

type  wj or  m  record 

application _ set;  set  of  application; 

default _ appl:  application; 


type  application  m  record 

name_  space:  set  of  name _ pair; 

default  proem:  procedure; 
processor  re  tourer:  proc_rtq; 
abelract  resource:  set  et  resource; 


name:  alfa; 

object:  object _ descriptor; 


type  prcc_reg 


min _ processors:  integer; 

max  processors:  Integer; 

min _ memory:  integer; 

max_memory:  integer ; 

lnstruction__eet:  machine _ type; 

performance:  set  of  perform_req; 


type  procedure  -  record 


entry  point:  program; 

name__space:  set  of  (tome _ pair; 

processor  resource.'  proc  neq; 
abstract _ resource  set  of  resource; 


The  'tekto  i  t  the  above  records  ate  described  as  follows: 


USER  The  application _ set  is  a  Uat  of  application 

names.  These  represent  the  total  eel  of  allowable  appl  ka¬ 
lians  Out  a  given  wee  Is  allowed  to  access.  The 
defauh__appl  is  the  application  that  a  user  will  be  auto¬ 
matically  connected  to  when  he  LOOONs  to  the  system. 
This  field  is  optional.  Creation  of  an  object  of  typo  user 
us  tuning  the  creator  bee  the  ’right’  to  create  such  an 
object,  (e.g.  vwr  hsf,  harry:  user  )  results  in  the  system 
creation  of  e  user  object  When  a  user  issue*  s  LOGON 
to  the  eyitam  (e.g.  LOGON  half,  tbs  system  search**  for 
the  user  object  named  hal.  If  not  found  then  tbe  LO¬ 
GON  1*  rejected,  otherwise  the  user  object  is  searched  for 

a  defaub _ appl  name  (e.g.  hal.defeuh  appl  -  null H.  If 

specified,  than  the  umr  wgl  be  connected  to  an  instance 
of  that  application  (one  will  be  created  if  it  does  oot 
already  exist). 

APPLICATION  •  An  application  defines  tbe  universe  of 
accestibOty  for  all  processes  and  men  connected  to  it. 

The  name  space  to  therefore,  a  eat  of  name _ pairs 

(representing  the  objects  that  can  be  accessed ).  Tbe  first 
element  la  the  pair  to  the  name  of  the  object,  and  the 
second  to  the  liter ript nr  of  the  object  mapped  to  by  tbe 
name-  Included  in  the  descriptor  ere  the  rights  of  access 
(such  ss  the  primitives  Read,  Write,  Execute),  end  the 
type  of  the  object  (such  as  queue,  procedure,  file,  nested 
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The  itfauh '  proem  to  the  bum  of  the  procedure  to  be 
invoked  (u  a  prooece)  when  u  Mtaaos  of  the  applica¬ 
tion  is  crested.  This  field  is  optional  sad  shows  as  appli¬ 
cation  to  Implicitly  lid  Helix*  iu  relevant  structure*.  Of 
course,  reeolstioa  of  the  seme  is  through  the  defised 

The  proomar_raacura  uprseests  the  processor  require- 
■teats  of  the  npplicatian  ssd  Is  self  explanatory.  Upon 
coespletioa  of  the  creation  of  aa  taatano*  of  an  applice- 
titm.  processor  resources  are  allocated  to  the  application. 
The  allocated  set  is  referred  to  sc  a  {«|  All  created 
processes  that  belong  to  aa  application  nu  in  the  fang 
sssodsted  whh  that  application. 

The  procedure  rspreaaats  a  model  for  either  the  crestioo 
of  a  process  (either  implicitly  or  through  usage  of  the 
START  commiued  to  be  discussed  below)  or  the  creation 
of  an  activation  within  a  prooese  (through  the  standard 
program  call  iatarfaoa).  Tbs  processor  requirement  speci¬ 
fication  in  the  procedure  shows  for  tbs  tailoring  of  a  giv¬ 
es  prooem  to  the  onniumiMs  reeoe-os  requirements  of 
the  program.  Specifically  performance  objectives  such  as 
priority,  degree  of  I/O  housitwhum,  deadlines,  etc  can 
be  mated  is  the  performance  requirement  of  the  pesos- 
Jut*.  The  ww_ew»  defines  the  soope  of  si-neesihlhry 
of  the  activation  (pawned  from  the  procedure.  If  s 

see as  spots  is  not  p resist  in  tbs  definition  then  the 
caller's  namr_spwc*  is  ssasnssd.  It  ahoakl  be  noted  that 
there  need  not  be  an  intersection  between  ap. plica- 
lion. Ham*  rtpace  and  a  enmr  macs  defined  in  e  proa- 
dun  whose  earns  is  in  application.  namt_tpan 
Trocadun .  abttmct__  resource  identifies  which  resources, 
such  as  data  beam,  files,  locks,  etc  have  to  be  ahoceted 
before  procaes/ activation  erection  oan  occur. 

Given  this  basis,  we  can  now  address  prooese  and  application 
creation.  Previously,  ws  hero  shown  how  an  application  (and  its 
default  prooem)  can  be  created  as  s  result  of  a  user  performing  a 
LOGON.  This  in  itmlf  is  not  novel  and  is  typical  of  many  interac¬ 
tive  rystims. 

We  we  will  now  dltcwm  eipbdt  process  and  application  instance 
creation.  The  commend  START  is  used  for  both  process  and 
application  instance  creation,  end  has  the  following  format: 

START  wriaUt.  mam 

This  ootwmsnd  Is  identical  to  the  form  that  would  be  used  within  e 
prooem  to  create  another  process  or  application.  The  variable  1* 
the  name  of  the  entity  being  started.  If  the  issuer  is  not  already 
const  re  Inert  within  an  application  ieetaace  then  START  seerrhes 
the  user  block  to  determine  if  the  rorinbts  in  a  valid  application 
name.  If  it  is  not  then  dm  request  is  rejected.  Otherwise,  aa  appli¬ 
cation  Instance  is  created  and  rseoutoss  aMoceted  (note:  the  con- 
snmabte  resources  aiioented  an  transparent  to  the  caller  end  are 
otdy  known  by  the  system).  If  the  appNcatian  has  a 
default  proem  defined  than  that  prooem  ie  imphddy  created. 

if  the  issuer  ie  already  constrained  to  a  gives  application  iaetance 
(either  throufh  a  LOGON  or  START)  than  the  variable  to  initially 
treated  as  e  procedure  name.  Ie  this  cam,  a  march  to  made  from 
the  appiication.  name__tpau  (for  a  first  time  pro  cam  crestioo)  or 
from  the  nama_ipac*  emocialed  with  the  tossing  process.  If  e 
procedure  to  sot  found  from  the  mutch,  then  a  march  is  nude 
from  tbs  acasr  block  treating  tbs  variable  as  aa  affticatioa  name.  If 
the  procedure  to  found  than  a  process  to  created.  The  cellar  to  not 
aware  of  where  the  Greeted  process  to  tunning 

The  name  specified  on  START  to  the  caller  known  name  of  the 
created  entity.  If  a  pro  w  was  created  then  the  application,  name 


represents  the  unique  name  ot  tbs  created  prooem.  tbs  STXRf 
request  will  fail  if  the  Caller  specified  asms  to  already  amoctatad 
with  another  prooem  in  the  application  Instance.  The  usage  of  this 
oases  will  becoms  apparent  la  the  dlarnesioa  on  tatar-prootm 
coMukttiot. 


What  has  been  shown  to  t  very  simple  way  to  effect  prooem 
creation.  Application!  and  processes  can  be  created  and  at  a 
result  logical  system*  can  be  formed  dynamically  and  without 

explicit  installation  is  tarnation.  The  decision  over  whether  the 
logical  systems  are  distributed  or  tightly  coupled  becomes  purely 
one  of  application  and  proesdbsre  definition  which,  of  courts,  can 
also  be  dynamically  modified. 

Dynamic  rh eager  to  resource  consumption  rights  and  processor 
scheduling  constraints  associated  with  any  creation  of  e  procedure 
may  be  made  by  simple  nee  of  the  declarative  structures  of  the 
language, 

Tbs  scheduling  oouetraimi  which  may  be  mao  dated  with  START 

suggest  that  rather  complex  global  systems  management  of  the 
type  aero  dated  with  torgs  seals  nniprn  nsmcre  may  be  a  feature  of 
a  aehipto  processor  aggregative  system.  Them  — i»~t»a-g  raise 
may  be  enforced  by  a  global  systems  irhi  fetor  node  or  by  co¬ 
operative  Imeraatioa  between  a  ant  of  operating  -jitirn  proc¬ 
essors  which  are  wren  elated  whh  the  compMuthMal  jrnrumlTi  of 
the  system  on  a  one-to-one  or  —  to  natrr  bads. 

INTER -PROCESS  COMMUNICATION 


Given  procures!  the  next  step  to  In  provide  them  whh  a  mean* 
for  effecting  later-proooe*  nnmmsmtretlnn  For  this  function  we 
will  postulate  the  extotenibe  of  aa  IsSer-Frooere  Comaufcalor 
(IFC).  The  IFC  can  estot  either  as  central  service  prooem  or  or  caa 
exist  as  a  dtotrtbaltf  eervios  In  am*  of  Urn  legkert  syitimi  de- 
flnnd.  Its  phyahml  ixtotmn  to  htarahy  tomtoronc  Whnt  to  tonpor- 
lant  to  that  tbs  mrviese  to  ptovidro  remains  Invariant  no  matter 
wham  tha  CPC  physluMy  retodsi.  Thas  an  ^phcttlsa  ntehdlag 
•hould  be  required  If,  for  example,  the  dectoinn  to  hero  a  gtobal 
IFC  proved  to  be  wrong. 

We  will  postulate  a  Send /Receive  marhiilmt  whh  ftro  verbs: 

CONNECT,  SEND,  RECEIVE.  SIGNAL,  ami  DISCONNECT. 

tu. - p.  -re  i — ■— *-  ix—  -*■—  r— - . — j-  —  i - ■  n-, 

with  each  other  directly,  or  through  named  objects  in  a  synchron- 
ixsd  or  asynchronous  minus.  The  abtofty  to  send  and  reestro 
between  process**  and  objects  permits  I/O  to  bn  snhsamnd  Mo 
tbs  onmmunirartim  meehaulmaa.  Conneot  oataMtobm  a  path  be¬ 
tween  an  Issuing  promos  and  any  aimed  object  of  the  system. 
Thus  a  process  may  snbesqnsatiy  SEND  meoaogm  to  another 
prooem  or  a  aaared  data  objrct,  Maamgta  snot  to  other  grunmil 
may  be  panned  through  queues  or  mat  directly  to  ports  of  a  re¬ 
ceiving  proems.  A  IMmtver  may  ask  for  mmmgsn  from  a  data 
object,  a  (fuses,  or  aantkar  process.  One  to  many  mletkamUps 
may  be  deftawd  to  ropport  mmmgs  kmadumlhn.  hmaddag  of  a 
mesmps  from  aay  of  a  set  of  pvidbto  server*,  etc. 
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Aa  important  potential  fmture  of 

tioa  (IFC)  reorheulvm  on  an  arnhhiftnri  whh  sag  dmcritmg  dau 
(«]  to  tan  um  of  the  CONNECT  vmb  So  darnshm  a - g-  temp¬ 

ts  t«  which  provides  s  dencsiptlba  of  ths  mmmgn  mnmsmm  which 
are  to  move  JwtWNt  ptftifldjif  A  oinriM  III  VC 


mechanisms  to  uncertainty  about  whethsr  a  semdsr  or  lemiwsr  to  at 
fault  when  maenaps  formats  do  act  match.  Thto  may  oogrr  be¬ 
cause  of  programming  arran  wMh  amm  a  wraag  pan  or  qeeue 
for  tunrmtosina  or  reoript  of  a  pmTMIm  mmmgn  The  provtoion 
of  s  massage  temjlnte  gives  tbs  IFC  a  mum  of  film  Ming 
whether  wrong  mmmgsn  or  bedty  formed - rt  me  the  re¬ 

sponsibility  of  the  sender  or  receiver.  In  syntaM  where  deta  to  mM 
describing,  nn  IFC  caa  check  ths  tags  of  n  msasage  for 
with  the  are  sags  tempiau.  Bach  receiver  truMtato  a  tamtam  U 
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the  expected  message  format  which  is  alto  checked  against  the 
metttge  template  provided  by  the  CONNECT,  which  has  the 
following  format: 


CONNECT  connect  point 

where; 


If  CONNECT  it  used  the  parameters  may  come  from: 

1 .  totally  from  the  tender 

2.  totally  from  the  receiver 

3.  in  tome  combination  of  both 


type  connect _ point  -  record 

ports:  Mt  of  message _ areas; 

path:  aet  of  (queue, file); 

message _ template:  message  format; 

case  (send, recoin)  of  "  ’ 

sendfto _ pen  process _ name), 

recein:(recein _ point  .procedure ; 

from _ port  process _ name); 


The  CONNECT  verb  cait  be  used  by  both  Seaderi  and  Receivers 

(hence  the  usage  of  case  In  defining  the  connect _ point  type). 

Ports  specifies  the  location  of  the  message  areas  in  the  issuing 
process  to  be  used  either  at  the  location  for  receipt  of  messages 
or  for  the  submission  of  messages. 

Path  it  optional  and  tpaciflca  an  indirection  point  In  the  transmis¬ 
sion  of  the  message.  The  ob|ect  of  the  path  is  physically  owned 
and  managed  by  the  tPC.  Usage  of  e  pash  in  the  tranamiuion  of  a 
message  guarantees  the  recovery  of  that  menage.  Queues  ue 
transient  end  exist  as  long  u  the  procsss  which  requested  its 
creation  (this  process  can  be  different  froas  either  sender  or  re¬ 
ceiver  aim  could  represent  a  caretaker  process).  Files  are  perma¬ 
nent  and  have  to  be  explicitly  destroyed.  Suboiaaion  of  a  message 
through  a  pmh  guarantees,  in  ganaral,  the  persistence  of  that 
message  even  though  the  sender  and  potential  receiver  go  through 
untimely  termination.  Both  FIPO  and  UFO  queueing  techniques 
ue  applicable  with  queues  and  files  and  to  specified  when  the 
object  is  created. 

The  eeempe_^temptm  sgaetfles  the  sandas/ receiver's  modal  foe 
the  mganga.  It  itwHiii,  far  snasapls.  the  length  of  the  message, 
what  the  tannUng  of  the  motsnw  I*  (ASCII.  Plead  Decimal. 
Parted,  etc),  and  ha  format  (for  a  multi -segmented  message). 
This  leaaplate  Is  wad  by  CONNECT  tat  compariaou  with  a  temp¬ 
late  associated  with  the  peek  A  sender's  sad  receiver's  template 
is  coaspared  with  the  path  teaapiat a.  If  there  is  s  disagreement 
between  the  path  template  end  that  of  sender  or  receiver,  the 
process  with  the  divergent  template  Is  notified  of  a  message  type 
error.  If  than  is  no  path,  the  sender’s  and  receiver’s  templates  are 
compared  with  estth  other.  In  case  of  an  error  both  processes  ere 
informed  of  a  mismatch.  This  feature  is  most  practical  for  hard¬ 
ware  systems  that  have  strong  features  of  self-describing  data  and 
tagged  memory. 

If  the  case  la  for  SEND  then  to  port  refers  to  the  process  name 
of  the  process  that  la  to  receive  the  request. 


If  the  csss  is  for  RECEIVE  then  the  recoin __poitu  refers  to  s 
procedure  that  is  to  be  invoked  when  another  process  issues  a 

SEND  (not  through  a  path )  to  that  proeoaa.  The  recoin _ paint 

represents  s  point  of  Interruption  tot  asynchronous  receipt  of 


ft  Is  pans*  is  to  mg  port  IWC  whhout  a  CONNECT.  U  as  CON¬ 
NECT  In  iaaasri,  ptaoasaaa  mag  w— unlcata  directly  cu  tedbectly 
using  the  wuei  oemeMUty  atmtaoi  mechanism*  of  the  opetatfag 
syetem  whUt  provide  for  mgrtriag  turn**  of  pracamea  aad  path 
objects.  In  this  wags,  full  spadWcarton  must  occur  with  SENDs 
and  RECEIVES.  Tha  penalty  for  such  uae  Is  increased  risk  of  run 
Ume  failure. 


in  cate  (1)  or  (2)  the  parameters  associated  with  the  CONNECT 
are  imposed  by  the  syetem  upon  the  relationship.  Case  (3)  raises 
interesting  considerations  that  have  not  yet  been  fully  explored,  at 
to  (he  degrees  of  freedom  between  SEND  and  RECEIVE  parame¬ 
ters.  For  example,  a  CONNECT  issued  by  a  receiver  that  names  a 
path  could  be  coaaidered  Innonaiatent  with  a  CONNECT  iasued  by 
a  tender  which  did  not  name  a  path.  However  we  may  convince 
ourselves  that  there  is  sosne  advantage  in  having  tranaparent  to 
one  side  of  the  send/receive  relation. 

The  operation  of  sending  a  message  can  now  be  described: 

SEND  token,  from  port,  path,  to  port 

SEND  has  four  operands.  The  from  port  specifies  which  message 
areas,  in  the  sending  process  contains  the  transmission  message. 
The  path  specifies  an  indirection  path  for  the  message  (as  de¬ 
scribed  above)  end  the  last  operand,  s  to _ port,  identifies  the 

process  that  will  receivt  tbs  massage.  SEND  automatically  blocks 

the  issuer  until  either  the  meseags  has  been  placed  on  a  path  (if 
specified)  o*  th*  receiving  procam  (if  no  path  has  bean  specified) 

ha*  reoslvsd  th*  maasagt. 

Specification  of  the  three  operands  (from _jort.patk.to _jort)  are 
optional  and  can  be  derived  from  the  preceding  CONNECT. 
Their  inclusion  on  SEND  Is  to  allow  an  area  to  be  used  to  send 
meaaages  to  more  than  one  counsel  point. 

In  fact,  if  to_jport  is  aot  specified  in  either  CONNECT  or  SEND. 
than  path  must  be  specified  in  either.  In  this  cate,  the  message 
will  he  placed  on  the  queue  or  fUe  by  the  tPC  and  the  sender  will 
be  SIGNALed  to  remove  It  from  the  blocked  state.  Such  messages 
can  be  removed  by  any  process  which  has  IPC  access  to  the  polk. 

if  path  is  not  spacifisd  in  sithar  CONNECT  or  SEND,  then  a 

10 _ P°rt  must  be  specified.  In  such  t  case,  the  message  is  sent 

directly  to  the  receiving  procam  (if  it  has  an  outstanding  CON¬ 
NECT  or  RECEIVE).  If  there  is  no  outstanding  RECEIVE,  then 
the  recetn_potnl  identifies  the  procedure  to  be  Invoked  and  an 
activation  is  immediately  created  and  made  the  current  one.  The 
deblocking  of  the  Sander  Is  than  the  responsibility  of  the 
rocoin_poen  actfveUoa  which  should  haw  a  SIGNAL  to  indicate 
receipt  of  th*  mawags.  If  a  RECEIVE  has  not  bean  iaaued  and 
th*  CONNECT  doe*  not  daftw  a  necetrn _joint  then  th*  CPC  will 
imphdtiy  qaaaa  the  msmags,  having  th*  lender  blocked,  until  * 
RECEIVE  la  tamed.  It  Is  still  th*  receiver'!  responsibility  to  de¬ 
block  the  sender.  The  systems  events  that  occur  when  there  is  an 
outstanding  RECEIVE  are  diacuaaed  below  when  we  describe 
RECEIVE. 


.  .  .  j  .  ...  \ 


The  token  i«  the  unique  identifier  of  the  message  and  is  assigned 
by  the  IFC  It  is  this  token  that  is  used  by  SIGNAL  to  indirectly 
deblock  the  sender  It  is  also  used  by  the  sender  to  later  deter¬ 
mine  the  (talus  of  a  submitted  message  (e.g  still  on  a  path, 
received,  etc).  Similarly,  if  a  RECEIVE  is  issued  without  a 

necelut _ point  specification  in  the  connect __peim,  then  the  receiver 

is  blocked  until  a  message  arrive!  for  it. 

SIGN  A  l.  is  then  simply  of  the  form: 

SIGNAL  token 

RECEIVE  is  similar  in  form  to  SEND: 


REt 'El i'£  token,  to _ port, path, from _ port 


where  lto_pori.palh.from_r«rU  refer  to  the  message  area  to 
receive  the  message,  the  pa'k  (or  indirection  for  the  message),  and 
the  sending  process  name  (optional).  Token  is  the  unique  identifi¬ 
er  of  the  transmitted  message,  returned  upon  successful  comple¬ 
tion  of  this  operation. 

If  path  is  not  specified  in  either  RECEIVE  or  CONNECT  then  the 
from _ port  must  be  spocified.  In  such  a  case,  the  receiver  is  ask¬ 

ing  for  a  message  from  a  specific  process  and  will  either  wait  or 

continue  asynchronously  (in  the  event  that  a  receive _ point  is 

specified  in  the  connect  point).  On  issuing  a  RECEIVE,  a  receiv¬ 
ing  process  will  get  a  message  if  a  message  ia  waiting  in  the  IPC 
mechanism.  If  there  is  no  message  and  there  is  no  named 
recede _ point  procedure  associated  with  the  CONNECT,  the  proc¬ 
ess  will  be  blocked.  If  there  Is  a  named  receive _ point,  the  ptocess 

will  be  permittee:  to  proceed  asynchronously 

Similarly  if  the  from  port  is  not  specified  on  RECEIVE  or  CON¬ 
NECT.  then  the  polk  must  be  specified  folk  identifies  ■  queue  or 
file  that  the  receiver  it  willing  to  receive  message*  from  any  proc¬ 
ess  using  this  polk.  The  receiver  will  be  able  to  receive  messages 
tent  to  either  this  path  or  to  the  pair  polk,  to  port  -  receiving 
process  name.  The  receiver  can  not  receive  messages  sent  to  the 
patk  and  directed  to  another  process,  at  a  patk  can  contain  mes¬ 
sages  directed  to  more  than  one  process  from  more  than  one 
process. 

If  a  from _ port  r  id  patk  arc  specified,  then  the  receiver  cso 

receive  messages  sent  to  the  patk  from  only  the  specified  process. 

SUMMARY 

Whit  has  been  shown  iu  the  previous  two  sections  is  a  simple  set 
of  primitives  for  process  creation  and  inter-process  communica¬ 
tion. 

The  primitives  are  configuration  independent  and  do  not  inhibit 
(he  installation  from  determining  the  appropriate  logical  systems 
structures. 

There  are  many  models  of  inter-process  communications  protocols 
which  differ  in  the  relation  of  SEND/RCCEIVE  to  process  block¬ 
ing  and  concepts  of  WAIT,  etc.  They  alio  differ  in  whether 
intervening  mechanisms  are  visible  to  communicating  processes, 
whether  meeesge  collections  survive  process  destruction,  whether 
messages  may  he  queued  or  forced  upon  receivers,  and  in  conven¬ 
tions  for  the  concept  of  reply  and  reepoaee. 


This  paper  has  described  a  design  by  which  simple,  direct,  synch¬ 
ronous,  transient  interprocess  communication  may  be  undertaken 
without  recoverability  and  integrity.  As  part  of  the  aaae  concept, 
an  intervening  file  or  queue  may  be  imposed  which  allows  many 
to  many;  one  to  many,  one  to  any,  Interactions  acroaa  protected 
paths.  The  concept  of  SIGNAL  ia  a  concept  of  reaponaa.  Replies 
art  seen  to  be  undertaken  through  the  issuance  of  leads  at  the 
convenience  of  a  receiver  when  he  eriahea  to  respond  in  a  mean¬ 
ingful  way  to  a  previous  message. 

The  notion  of  START  presented  by  this  paper  intends  to  provide 
a  mechanism  by  which  processes  can  initiate  other  processes  and 
call  (or  eaecution  on  nodes  of  the  system  that  have  various  per¬ 
formance,  status,  toad  and  scheduling  attributes. 

A  paper  under  pieparation  discusses  various  aspects  of  the  struc¬ 
ture  of  an  operating  system  that  would  support  the  language 
constructs  discussed  here. 
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Abstract 

A  reduction  language  Is  a  functional  pro- 
graaaing  language  whose  seaantics  is  defined  by 
a  set  of  rewrite  rules. 

Our  paper  describes  the  architecture  of  a 
machine  which  directly  executes  reduction 
language  programs. 

A  laboratory  model  of  this  Reduction  Machine 
has  been  built  at  the  GMD  Bonn  and  is  currently 
used  for  experimental  program  design  based  on 
Berkling's  version  of  a  Reduction  Language. 


Reduction  language  machines  constitute  a  novel 
approach  to  computing  that  is  radically  different 
from  the  conventional  von  Neumann  concept. 

As  the  main  feature  of  reduction  languages  is 
their  strictly  functional  style  of  program  design, 
the  architecture  of  a  Reduction  Language  Machine 
cannot  be  understood  without  having  a  basic 
knowledge  of  the  language  constructs  and  their 
execution. 

There  already  exist  a  number  of  papers  dealing 
with  this  subject,  of  which  are  primarily  to 
mention  those  by  J.  Backus  [BACKUS  72  &  78]  and  by 
K.J.  Berkling  [BERKLING  76]  who  originated  the 
research  in  this  field,  and  by  F.  Hommes 
[HOHHES  77  &  79]  who  implemented  the  first 
simulation  model  of  a  Reduction  Machine. 

However,  it  is  thought  helpful  for  the  reader  of 
this  paper  to  be  briefed  on  the  Reduction  Language 
with  particular  emphasis  on  the  aspects  that  are 
relevant  to  an  appropriate  machine  organisation. 

The  paper  outlines  a  few  basic  Reduction 
Language  constructs,  their  rules  of  execution,  and 
the  machine  features  that  adequately  support  the 
processing  of  Reduction  Language  expressions. 

Then  we  give  an  overview  over  the  machine 
organisation  and  its  operating  principles,  and  a 
functional  description  of  a  hardware  model  of  the 
Reduction  Machine  which  has  been  constructed  at  the 
GMD  [KLUGE  79]. 


As  the  Reduction  Language  is  supposed  tc  permit  a 
strictly  functional  method  of  program  design,  its 
most  fundamental  construct  is  of  the  form 

apply  function  to  argument 

The  components  of  this  expression  map  onto  a 
binary  tree  with  'function'  and  'argument' 
appearing  in  the  left  and  right  subtree, 
respectively,  and  with  the  'apply  to'  as  root  node; 

apply  to 
/  \ 

function  argument  Fiq.1 

In  general,  'function'  and  'argument'  are 
non-trivial  tree-structured  expressions.  The' 
'apply  to'  is  a  constructor  which  relates  two 
subexpressions  in  some  meaningful  way  to  each 
other. 

More  rigidly,  an  expression  e  of  the  Reduction 
Language  is  defined  as  e  :  =  con  el  e2,  which  is  the 
preorder  notation  of  the  tree 

con 
/  \ 

el  e2  Fiq.2 

that  links,  by  means  of  the  constructor  'con',  two 
subexpressions  'el'  and  'e2'  to  each  other  to  form 
'e\ 

The  most  simple  expressions  are  atoms,  such  as 
primitive  function  symbols,  letter  strings  of  any 
finite  length  representing  variables,  or  strings  of 
decimal  digits  which  fora  decimal  numbers. 

Using  this  basic  structure  of  Reduction  Language 
expressions,  a  language  designer  would  have  to 
establish  a  set  of  primitive  functions,  data  types 
and  constructors,  which  must  be  complete  in  the 
sense  that  every  computational  problem  can  be 
formulated  by  a  systematic  application  of  these 
primitives. 

In  this  paper,  we  do  not  discuss  the  development 
of  such  a  complete  language  but  introduce  only  a 
particular  tree-processing  primitive  of  a  special 


,1 


I 


Reduction  Language  [HOMES  79]  to  show  the  basic 
operating  principle  of  the  aachine:  let  ’>'  be  a 
constructor  which  builds  binary  trees,  i.e.  •>  A  B' 
is  the  tree 

> 

/  \ 

A  B  Fid. 3 

with  'A*  as  left  and  'B'  as  right  subexpression. 

Let  'head'  be  a  priaitive  function  which  selects 
the  left  subtree  of  such  a  binary  tree,  i.e. 

apply  head  to  >  A  B 

results  in  'A';  this  transforaat  ion  of  an 
expression  to  another  expression  of  the  saae 
weaning  is  called  reduction. 


The  basic  aachine  fiaicttons  that  are  necessary  to 
execute  Reduction  Language  expressions  a  ay  be 
readily  derived  froa  what  has  been  said  about  the 
language  primitives  in  the  previous  section. 
Roughly  speaking,  there  eust  be  means  to 

-  represent  a  Reduction  Language  expression  in 
a  suitable  storage  medium  so  that  its  tree 
structure  is  uniquely  exhibited; 

-  perform  a  preorder  traversal  of  the 
expression  stored  within  this  medlia; 

-  recognize,  within  the  iaaediate  environment 
of  the  actual  traversal  position,  the 
occiarence  of  a  reducible  subexpression; 

-  execute  the  reduction  according  to  the 
meaning  of  the  respective  primitive 
expressions  (which  primarily  Involves 
traversal  functions  such  as  the  comparison, 
deletion,  insertion,  and  copying  of 
subexpressions) ; 

-  resume,  after  the  completion  of  a  reduction, 
the  traversal  up  to  the  topmost  root  node  of 
the  expression  tree. 

The  first  two  problems  were  solved  by  representing 
the  Reduction  Language  expressions  in  the  preorder 
notation  'con  el  eZ',  and  storing  them  in  a 
push-down  stack,  with  the  root  node  symbol  on  the 
top:  so,  the  expression-tree 

apply  to 
/  \ 
head  > 

/  \ 

A  B  Fiq.« 

Is  represented  as  'ap(ply)  h(ea)d  (to)  >  A  B'  in 
preorder  and  stored  in  a  stack  as 


I  ap  I  hd  |  >  |  A  |  B 

-  Fig, 5 


Since  the  preorder  traversal  scheme  requires 
that  the  root  node  is  inspected  first,  followed  by 
the  traversal  of  the  left  subtree  in  preorder, 
followed  by  the  traversal  of  the  right  subtree  In 
preorder,  it  siaply  takes  a  succession  of 
pop-operations  to  have  the  digression  emerge  froa 
the  stack  in  the  desired  sequence,  with  the  itea  on 
top  of  the  stack  being  the  actual  traversal 
position. 

A  SiNK-stack  aust  be  provided  into  which  all 
symbols  popped  out  of  the  first  SOURCE-stack  aust 
be  pushed  in  order  to  conserve  the  expression 
during  the  traversal.  The  expression  ending  up  in 
the  SiNK-stack  is  supposed  to  appear  with  the  root 
node  symbols  on  top  of  its  respective 
subexpressions.  To  accoaplish  thia,  a  third  stack 
is  required  as  an  intermediate  storage  for 
constructors  since  they  emerge  from  the 
SOURCE-stack  ahead  of  their  subexpression  but  must 
enter  the  SINK-stack  after  them. 

The  corresponding  traversal  algorithm  brings 
about  the  following  phases  with  ragard  to  the 
contents  of  the  stacks  E  as  SOURCE-stack,  A  as 
SiNK-stack,  and  M  as  intermadiata  stack.  Initially, 
the  expression  resides  in  the  E-stack;  tha  stacks  A 
and  M  are  empty,  and  the  topmost  item  on  E  is 
Inspected: 


As  the  item  is  a  constructor,  it  is  transferred 
into  the  H-stack  mid  marked  with  the  superscript 
'l'  which  indicates  that  tha  laft  subexpression  of 
this  constructor  is  now  going  to  be  moved  from  the 
E-stack  to  the  A-stack: 


The  focus  of  control  returns  to  the  top  of  the 
E-stack  and  moves  the  atom  'hd'  into  tha  A-stack: 


i  ;s 


/ 


Then  the  focus  of  control  turns  to  the  M-stack. 
The  'ap'  on  top  of  the  M-stack  is  found  to  be 
marked  with  an  'l';  as  its  left  subexpression  has 
just  been  moved  over  to  the  A-stack,  the  marking  is 
changed  to  'r',  indicating  that  now  its  right 
subtree  is  on  top  of  the  E-stack : 


A-stack 


E-stack 


M-stack 


The  top-element  of  the  E-stack  is  a  constructor  •>* 
which  is  put  into  the  M-stack  and  marked 
with  an  • l ' : 


The  constructor  'ap*  which  appears  now  on  top  of 
the  M-stack  is  found  to  be  marked  with  an  'r'.  As 
its  right  subexpression  has  just  been  moved  into 
the  A-stack,  the  constructor  'ap*  must  be  popped 
out  of  M  and  pushed  into  stack  A: 


lid  I  A  |  n  |  >  | 


A-stack 


E-stack 


E-st  ack 
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This  completes  the  traversal  since  the  stacks  E  and 
M  are  empty  and  the  expression  is  lined  up  in  the 
SINK-stack  A  in  a  transposed  preorder  form,  with 
the  left  and  right  subexpression  interchanged. 

The  execution  of  the  same  traversal  algorithm  with 
A  as  SOURCE-  and  E  as  SINK-stack  reestablishes  the 
original  situation  shown  in  Fig. 6. 


There  are  two  important  things  that  need  to  be 

not  iced: 


M-stack  r 

ap 


jf.i.a.i.lQ 


Then  the  left  subtree  'A*  of  ’>'  is  moved  into  the 
A-stack  and  the  constructor  '>'  is  marked 
with  an  'r' : 


A-stack 


E  -stack 


The  manipulation  of  the  stack  contents, 
splits  into  two  phases.  First  the  item 
which  constitutes  the  focus  of  control,  the 
top  of  either  the  E-stack  or  the  M-stack, 
is  inspected.  Then  this  item  becomes  the 
subject  of  a  stack  operation,  which  is 
either  a  transfer  to  another  stack  or  a 
write-operation  on  the  same  stack. 

The  constructor’  on  top  of  the  M-stack 
controls  the  movement  of  its 
subexpressions;  moreover,  there  is  a 
situation  where  'ap'  is  on  top  of  the 
li-stac  k,  a  function  symbol  is  on  top  of  the 
A-stack,  and  the  argument  expression  is  on 
top  of  stack  E: 


H-stack 


After  the  atom  '£»'  has  been  moved  into  the  A-stack, 
the  constructor  ’>'  is  found  to  be  marked  with  an 
'r'  and  can  be  pushed  into  the  A-stack  to  complete 
the  traversal  of  the  subtree  ’>  A  B': 


hd  A  B  > 


A-stack 


- n 


E-stack 


Fia.i? 


A-stack 


1 - 1  Fig, 1£ 

This  property  of  the  traversal  scheme  serves  to 
recognize  reducible  expressions. 

In  our  example,  the  traversal  scheme  brings 
about  a  situation  where  'ap'  appears  on  top  of 
stack  M  and  the  function  'hd'  on  top  of  stack  A. 


.  d  :V;- .  r.  VC  , 
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This  situation  aay  ba  readily  detected  by 
simultaneously  watching  the  tops  of  the 
1-;  A-  and  H-stack  during  the  execution  of  the 
preorder  traversal. 

If  an  instance  of  a  reduction  rule  occurs,  the 
traversal  is  immediately  suspended  and  control 
switches  to  another  algorithm  which  performs  the 
appropriate  reduction  steps. 

The  reduction  algorithm  calls  other  algorithms 
which  participata  in  the  evaluation  of  the 
particular  subexpression. 

This  transfer  of  control  is  accomplished  by 
conventional  methods  of  subroutine  stacking:  coda 
words  representing  the  algorithms  are,  in  their 
order  of  activation,  pushed  into  a  system  control 
stack  5,  and  popped  out  upon  termination  so  that 
control  eventually  returns  to  the  original 
traversal  algorithm. 


The  redaction  of  an  expression  involves  rather 
simple  primitive  operations  like  the  deletion  of  an 
expression  which  may  be  viewed  as  a  traversal 
without  a  SINK-stack,  copying  which  is  a  traversal 
with  one  SOURCE-steck  and  two  SINk-stacks,  and 
comparison  which  is  a  traversal  with  two 
SOURCE-stacks  and  on*  SINK-stack. 

For  instance,  the  reduction  of  the  expression  in 
Fig. 14  can  be  done  at  follows:  first,  tha  primitive 
function  'hd*  in  the  A-stack,  tha  constructor  ’ap‘ 
in  the  M-staek  and  the  tree-constructor  ’>'  on  top 
of  ttio  F-stack  are  deleted;  then  the  atom  'A'  ir. 
moved  to  the  A--., tuck,  the  atom  'U'  Is  deleted  and 
‘A*  is  moved  back  into  the  E-stack;  so, 
•apply  head  to  >  A  B’  is  reduced  to  'A'. 

of  course,  this  procedure  also  works  properly  if 
•A’  end  'e1  art  not  only  atoms  but  trees. 


Other  important  algorithms  include  those  for 
performing  arithmetic  operations  on  decimal  numbers 
uf  any  f mite  length.  In  this  case,  two  atomic 
subexpressions  representing  the  operands  must, 
symbol  by  symbol,  be  popped  out  of  their  respective 
SOURCE-stacks  and  moved  through  an  arithmetic  un’t 
whose  output  is  pushed  into  the  SINK-stack. 


To  provide  sufficient  space  for  expression 
manipulation  it  is  convenient  to  have  more  than  the 
stacks  E,  A,  M  and  S  aval  labia:  so,  the  machine  has 
another  three  stacks  named  B,  U  and  V  to  store 
expressions. 


An  expression  is  manipulated  only  by  push,  pop, 
read  or  write  operations  affecting  the  items 
residing  on  top  of  the  stacks. 

There  is  no  addressing  of  objects  within  the 
expression  involved:  they  aay  become  the  focus  of 
attention  only  through  an  orderly  traversal  of  the 
expression  tree  which  brings  thorn  to  the  top  of  one 
ot  the  stacks.  Addresses  are  used  only  to  identify 
the  stacks  that  are  to  bo  operated  upon  in  a 
particular  instance. 


Controt  over  the  stacks  is  exercised  by  means  of 
the  Reduction  Unit  which  may  be  considered  as  the 
processing  unit  of  the  machine.  The  overall 
function  of  the  Reduction  Unit  is  vary  simple. 
Under  the  control  of  the  algorithm  residing  on  top 
of  the  system  control  stack  S,  it  inspects  the 
topmost  symbols  of  one  or  two  selected  stacks. 
Thereupon,  it  goes  through  a  decision  process 
(realized  by  combinatorial  logic  networks)  as  a 
result  of  which  it  may  issue  new  symbols  and 
specify  stacks  which  are  to  be  pushed,  popped, 
written  into  and  read  next.  A  small  sequential 
network,  comprising  some  status  flipflops, 
navigates  the  machine  through  the  sequence  of 
actions  required  by  the  red'iction  process. 

More  specifically,  the  Reduction  Unit  provides 
all  the  facilities  to  perform  the  various  traversal 
algorithms,  to  recognize  instances  of  reductions, 
and  t.o  execute  the  reductions,  including  an 
arithmetic  unit  for  arithmetic  operations  on 
decimal  numbers. 

There  is  also  an  I/O-Procesaor  which  loads 
expressions  into  the  machine  and  unloads  them  after 
reduction,  and  via  which  the  user  may  exarcise 
controt  over  the  machine. 

An  elementary  cycle  of  operation  within  the 
Reduction  Machine  quite  naturally  partitions  into 
four  phases  as  illustrated  belowi 


phase  (1) 

phase  (2) 

Reduction  Unit 

Transfer  of 

analyses  stock 

- > 

symbols  from  the 

symbols 

Reduction  Unit 

to  the  stacks 

a 


v 


phase  (4) 

phase  (3) 

Transfer  of 

Operations 

symbols  from  the 

< - 

on  the 

stacks  to  the 

stacks 

Reduction  Unit 

FJflsM 

Starting  in  phase  (1),  the  Reduction  Unit  is  about 
to  analyse  what  it  has  just  read  from  tha  selacted 
stacks.  Then  the  machine  enters  phase  (2)  during 
which  push,  pop,  read  and  write  control  signals, 
together  with  new  symbols,  are  transferred  from  the 
Reduction  Unit  to  the  stacks. 

During  phase  (5),  up  to  foir  stacks  can  be 
pushed  and  popped  such  that  new  symbols  appear  in 
them  topmost  positions  at  the  end  of  this  phase. 
During  phase  (4),  the  topmost  items  of  the  stacks 
which  have  been  selected  for  a  read  opmration  are 
moved  into  the  Reduction  Unit  which  again  enters 
phase  (1). 
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rhe  hardware  model  of  the  Reduction  Machine  was 
primarily  intended  to  demonstrate  the  feasibility 
ot  the  Reduction  Language  principles.  Its  design 
was  largely  determinded  by  the  objective  of  getting 
a  simple  and  reliable  machine  into  operation  as 
quickly  as  possible. 

The  machine  employs  standard  low-power  Schott ky 
TTL  technology  for  all  logic  circuits,  registers, 
status-f l tpf lops  etc.,  fast  read-only-memory 
tlavices  for  the  realization  of  a  control  store  in 
which  the  reduction  algorithms  are  implemented,  and 
dynamic  random  access  memory  chips  for  the 
realization  of  the  stacks. 


All  machine  operations  are  under  the  control  of 
a  central  clock  which  subdivides  a  machine  cycle 
into  eight  intervals  of  equal  length.  As  the  clock 
runs  at  a  frequency  of  6. 25  MHz,  an  interval  lasts 
160  nsec  and  a  machine  cycle  lasts  1.28Q 
microseconds.  The  effective  speed  of  operation, 
however,  is  u lightly  slower  since  every  16th 
machine  cycle  is  used  for  a  refresh  operation  on 
all  stacks. 


A  block  diagram  of  the  hardware  architecture  is 
shown  in  Fig. 16.  It  comprises  the  Reduction  Unit 
(which  is  subdivided  into  four  subunits  named 
TRANS,  REOREC,  REDEX,  ARITH) ,  a  set  of  seven 
pushdown  stacks,  a  bus  system  which  handles  the 
traffic  of  symbols  and  control-signals  between  the 
Reduction  Unit  and  the  stacks,  a  central  timing 
system  CTS,  and  an  I/Q-Processor  (a  conventional 
INTEL  SBC  80/20  single  board  computer)  which  also 
performs  some  monitoring  and  preprocessing 
I  tine t  io.is. 

The  data  paths  within  the  entire  machine  are 
laid  out  to  accommodate  byte  formats  (eight  bits 
plus  parity),  i.e.  all  stacks,  data  busses  anu 
Reduction  Unit  circuits  are  one  byte  wide. 


The  Reduction  Unit  comprises  four  modules,  each 
of  which  is  accommodated  by  a  separate  printed 
circuit  board: 


-  TRANSport  performs  all  traversal  algorithms 
(including  deletion,  comparison,  copying); 

-  REOuction  RECognition  is  a  combinatorial 

logic  network  that  looks,  during  the 

traversal  of  an  expression,  for  the 

appearance  of  an  Instance  of  a  reduction. 
Upon  the  detection  of  a  reducible  expression, 
REOREC  immediately  deactivates  the 
TRANS-subunit,  pushes  a  new  algorithm-code  on 
top  of  the  S-stack  and  turns  control  over  to 

-  REDuction  Execution,  which  essentially 
comprises  a  fast  control  memory  containing 
all  the  control  programs  which  are  required 
to  perform  the  reductions.  As  for  arithmetic 
operations,  REDEX  is  supported  by  the 


ARlTllmet ic  unit  which  performs  the  arithmetic 
operations  on  the  decimal  numbers  which, 
under  the  control  of  REDEX,  are  received 
digit  by  digit  from  the  respective  SOURCE 
stacks;  the  resulting  digits  are  sent  back  to 
the  SINK  stack. 
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Block  Diagram  of  the  Reduction  Machine  Architecture 
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«  A  stack  is  schematically  shown  in  Fig. 17.  The 

major  components  are  the  random  access  memory,  a 
stack  pointer  to  the  actual  top-of-stack  location, 
l  a  separate  TOP  OF  STACK  register  in  which  the 

actual  topmost  item  resides,  and  a  COPY-registc'  in 
‘  which  a  copy  of  the  contents  of  the  TOP-register  is 

;  held. 


v  The  TOP-register  may  receive  a  data  item,  via 

'  the  multiplex  circuit  INBUSSELECT,  from  one  of 

three  sources:  the  KBUS,  the  LBUS,  or  the  memory 
f  celt  that  is  addressed  by  the  stack  pointer.  The 

(contents  of  the  TOP-register  may  be  supplied  to  tlw» 
KBUS  or  LBUS  via  the  multiplex  circuit 
OUTBUSSELECT. 

p  The  stack  operations  are  es  follows:  the 

|  TOP-register  contains  the  topmost  data  item  k  of 

I  the  stack,  a  copy  of  which  is  in  the  COPY-register; 

[;  Hie  r.tackpointor  addresses  the  first  empty  cell  of 

|:  the  rt.ndoin  access  memory  stack  area. 

t 


COPY  OF  TOP 


INBUSSELECT 


OUTBUSSELECT 


a  a  a 


Upon  a  push  operation,  an  item  enters  vim 
INBUSSELECT  from  KBUS  or  LBUS  and  is  written  into 
the  TOP-register.  Subsequently,  the  contents  of  the 
COPY-regtster,  i.e.  the  old  top  of  the  stack,  are 
stored  away  tn  the  empty  cell  addressed  by  the 
stackpointer.  Afterwards,  the  stackpointer  is 
incremented  by  one  to  point  again  to  the  first 
empty  cell,  and  the  contents  of  the  TOP-register 
are  copied  into  the  COPY-register. 

Conversely,  if  the  stack  is  to  be  popped  up,  the 
stackpointer  is  first  decremented  by  one  to  point 
to  the  last  occupied  cell,  then  the  contents  of 
this  memory  cell  are  read  out  and  written  into  the 
TOP-register,  whose  now  value  is  copied  into  the 
COPY-register. 

Read  and  write  operations  affect  only  the 
contents  of  the  TOP-  and  COPY-registers  and  cause 
no  memory  access  cycle. 

Input/output  processing  and  certain  system 
support  functions  are  handled  by  a  conventional 
INTEL  SBC  80/20  single  board  microcomputer  which, 
via  a  tailoi — made  I/O- inter face,  is  attached  to  the 
bus  system  of  the  Reduction  Machine.  The  currently 
implemented  I/O-conf igurat ion  only  supports  a  data 
station  Hewlett  Packard  HI  2f>4SA  which  perfectly 
suits  the  purpose  of  the  Laboratory  Model: 
Reduction  Language  expressions  can  be  edited, 
shipped  into  the  Reduction  Machine  for  the 
execution  of  a  user-specified  number  of  reductions, 
and  displayed  afterwords.  As  the  HP  2645A  data 
station  includes  two  tape  cartridge  drives,  user 
e/i  •  ess  ions  and  standard  library  functions  may  be 
stored  away  to  and  retrieved  from  tape. 


Perspective 

When  assessing  its  strengths  and  weaknesses,  the 
Reduct  ion  Machine  architecture  and  its  hardware 
realization  as  described  in  this  report  should  be 
seen  in  the  tight  of  the  following  aspect*: 


-  the  Reduction  Machine  is  the  first  of  its 
kind  that  directly  supports  the  exocution  of 
reduction  languages;  its  architecture  has 
been  straightforwardly  derived  from  the  basic 
structure  of  reduction  language  expressions 
and  their  rules  of  execution; 

-  the  concept  of  not  using  addresses  for  the 
representation  of  expressions  within  the 
Reduction  Machine  has  nowhere  been 

compromised; 


«  KBUS 


t toe  Reduction  Machine  was  primarily  conceived 
as  an  interactive  toot  for  systematic 
construction  of  functional  programs,  serving 

only  one  user  at  a  time; 
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Block  Diagram  of  the  Reduction  Machine  Stack 
Organisation 


-  the  hardware  model  was  simply  intended  as  a 
vehicle  to  demonstrate  that  the  Reduction 
Language  concept  can  be  adequately  supported 
by  the  proposed  machine  architecture;  neither 
memory  capacity  nor  performance  in  terms  of 
program  runtime  were  a  design  objective.  s<) 
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There  remain  a  number  of  problems  that  need  to 
be  solved  before  the  Reduction  Machine  can  be 
accepted  as  a  competitive  alternative  to  von 
Neumann  computers.  At  the  level  ul  machine 
architecture,  these  problems  concern 


-  adequate  interfacing  with  peripheral  memory 
devices  like  disks  and  tapes  to  support 
program  libraries,  serious  data  base 
appl icat ions,  and  also  the  concept  01 
'virtual  stacks’,  i.e.  transparent  stark 
extension  into  secondary  storage; 

-  program-control  led  input  and  output  of 
expressions  from  and  to  peripheral  devices; 

interrupt  facilities  supporting  the 
communication  with  1/0-Processors,  real  time 
applications  and  the  cooperation  with  other 
Reduction  Machines; 

-  measures  that  remedy  a  serious  performance 
degradation  in  list  processing  applications 
which  is  caused  by  excessive  copying 
act ivi t ies. 


Preliminary  studies  have  shown  that  program 
controlled  I/O  and  interrupt  handling  can  neatly  be 
integrated  into  the  language  concept  by  introducing 
appropriate  constructs. 

As  it  appears  now,  the  interfacing  with 
conventional  peripherals  necessitates  traditional 
file  management  methods  and  data  transmission 
techniques  since  device  controllers  are  designed 
for  standard  interfaces  with  convent  lotul 
computers.  Hence,  the  microprocessor  approach  for 
I/O-hand  l  ing  wtiich  has  been  taken  with  tin 
Laboratory  Model  seems  to  be  a  step  into  the  right 
direction,  guided  by  the  type  of  peripheral  devices 
that  are  currently  available  in  the  market-place. 
HO' ever,  with  future  advances  in  electomc  disk 
technologies,  stack-type  peripheral  memory  devices 
ol  sufficiently  large  capacity  that  are  compat  i.dr 
with  the  internal  structure  of  the  Reduction 
Machine  may  be  anticipated. 

To  significantly  expedite  the  processing  of 
targe  list  structures,  t  he  hard-lined 
'no-addresses'  approach  ir-ay  have  to  be  softened  to 
some  extent.  Conceivably,  subexpress  ions  could  be 
linked  to  their  respective  constructors  by  relative 
pointers  within  the  internal  representation  of  an 
expression.  Along  these  pointers,  the  focus  of 
control  could  be  moved  directly  to  a  particular 
subexpression  rather  than  traversing  linearly 
through  the  expression  tree  that  is  to  the  left  and 
above  it. 

It  may  also  be  envisaged  that  such  a  pointer 
structure  facilitates  the  partitioning  of  an 
expression  into  subexpressions  of  suitable  meaning 
that  can  be  distributed  for  concurrent  processing 
within  a  system  of  cooperatin'!  If  duct  Km  Machine  . 
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Abstract 

A  wide  dais  of  languages  can  be  defined  by  using  a 
constructor  syntax.  This  paper  gives  a  short  in¬ 
troduction  to  the  constructor  syntax  and  describes 
on  interactive  editing  systea  for  languages  having 
such  a  syntax.  In  contrast  to  conventional  line- 
oriented  editors  this  editing  systea  is  completely 
expression-oriented.  The  systea  has  been  success¬ 
fully  implemented  for  Berkling's  Reduction  Machine. 


Backus  introduces  in  his  report  C BACKUS  7JJ 
languages  with  a  constructor  syntaxi  The  pair  <A,K) 
i$  a  constructor  syntax  for  a  language  E  if  the 
following  conditions  hold: 

I.  A  c  E 

Z.  Each  k  e  K  is  a  function  from  a  subset  Sk  of 
E"  into  E 

J.  For  every  e(E,  either  •  €  A  or  there  are  a 
unique  ken  and  unique  el,..we„  c  L  such 
that  k[el#...,e„]  -  e. 

Each  element  a  of  A  is  called  on  atoa.  and  each 
k  c  K  is  called  a  constructor.  Let  k[el,...,e„]  ■ 
e,  then  et,...,e„  are  called  subexpressions  of  the 
expression  e  and  k  is  called  an  n-place- 
constructor.  Each  expression  of  a  language  with  a 
constructor  syntax  is  either  an  atom  or  can  be 
written  ns  e  *  k[et/...,e„ ]. 

Example  1:  Definition  of  a  language 

Let  A  *  {Si,..,a,)  and  K  ■  <kt,kt}  with 
ki  €  [ExE  — >  E],  i.e.  the  ki's  are  two-place- 
constructors.  Then  (A,K)  is  a  constructor  syntax 
defining  a  language  which  we  cell  E  and  to  which 
w:  uill  refer  in  the  following  chapters.  An 
exn.rple  of  en  expression  of  the  language  E  is 
kitai/kxlai#k|[aa«k|Ce.  ,8f ]]]] 

Expressions  of  languages  with  a  constructor  syntex 


can  be  represented  as  trees.  Atom  become  the 
leaves  of  the  trees,  wherees  the  constructors  for* 
the  nodes. 

/  V 

■>  kj 

a,  k, 

/  \ 

a,  k, 

/  \ 

•<  •» 

Fig.  it  Tree-representation  of  the  expression  of 
exaaple  1. 

The  language  E  which  has  bean  defined  In  Exaaple  1 
looks  vary  abstract,  for  wa  did  not  assoc lata  any 
aeaning  with  the  atoas  or  constructors,  wa  just 
gave  thee  foraal  naaes. 

Now  we  are  going  to  discuaa  the  following  two 
representations  of  the  language: 

1.  Us  representation  within  a  aachine  (aachine 
interface  or  intgrni.1  .r.ecrttuntiUop) 

Z.  Its  representation  on  a  display  station  (user 


A  representation  of  an  expression  within  a  given 
aachine  is  obtained  by: 

1.  coding  the  atoms  and  constructors 

Z.  mapping  the  structure  of  the  expressions  into 

storage 


Each  atoa  or  constructor  is  stored  within  a  aeaory 
cell;  the  coding  function  aapt  the  syabolic  naae  of 
an  atom  or  constructor  into  e  value  which  fits  into 
a  memory  item,  e.g.  a,  is  mapped  to  the  hexadecimal 

constant  X’35'. 

In  the  following  we  will  denote  the  coding  of  an 
atom  or  a  constructor  x  by  ix,  i.e.  the  syabolic 
name  for  the  coding  of  the  atoa  a,  is  $a(. 
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We  have  already  mentioned  that  each  expression 
can  be  represented  by  a  tree;  this  means  that  we 
have  to  aap  a  tree-structure  into  memory.  A  con¬ 
venient  way  to  do  that  is  to  connect  the  elements 
by  pointers.  Figure  2  shows  such  a  realization  for 
the  expression  defined  by  Example  li 
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Fig.  2:  Representation  of  an  expression  by  using 
pointers. 

Berkling  has  used  another  method  within  his  Reduc¬ 
tion  Machine;  the  expressions  are  stored  within 
stacks,  using  the  preorder  notation  of  the  asso¬ 
ciated  express ion-tree.  Figure  J  shows  how  the  ex¬ 
pression  of  Example  1  is  stored. 

Top  of  stack 


I**,  lia.  l$ki  l$a,  |$k,  |$a,  |$k,  |$a  J$a, 


Fig.  3;  Representation  of  an  expression  in  a  stack 
using  preorder  notation. 

In  this  paper  we  will  prefer  the  stack  representa¬ 
tion  since  it  has  the  following  advantages: 

1.  The  representation  is  very  close  to  the 
formal  definition  of  expressions,  i.e. 
removing  brackets  and  commas  from  the  formal 
definition  leads  directly  to  the  preorder  no¬ 
tation  (cf.  Example  1  and  Figure  3). 

2.  It  is  free  of  pointers  which  are  not  directly 
related  to  the  problem. 

Thus  the  algorithms  of  the  editing  system  which  we 
are  going  to  describe  will  be  more  clear  and  pre¬ 
cise,  for  they  are  free  from  pointer  manipulation 
and  garbage  collection  problems. 


J-  External  Representation 

Normally  the  user  is  not  interested  in  the  internal 
coding  of  an  expression.  He  wants  to  see  certain 
keywords  or  strings  which  have  a  meaning  to  him. 

Therefore  we  need  another  function  -  the 
I/O-f unction  -  mapping  formal  expressions  Into  ex¬ 
pressions  which  cmn  be  understood  by  the  user.  The 
I/O- function  can  be  defined  by  a  table  which  asso¬ 
ciates  all  atoms  with  a  string  and  all  constructors 
with  a  prototype-expression  that  consists  of  some 
keywords  and  place-holders  (□>  which  indicate  where 
tha  subexpressions  are  going  to  be  inserted.  In 
Figure  4  a  possible  I/O-table  for  the  language 
defined  in  Example  1  is  shown: 


aj  «  head|a,  »  tail 

•i  “  M»*  *  B|a,  =  C 

kj  -  apply  d 
to  D 

k,  ■  >  □  o 

Fig.  4:  I/O-table  for  the  language  given  by  Exam¬ 
ple  1  (translation  to  Berkling's  Reduction 
Language) . 

Using  the  I/O-table  above  the  expression  of  Exam¬ 
ple  1  is  displayed  as: 

apply  head 

to  apply  tail 

to  >  A  >  B  C 

which  is  a  valid  expression  for  Berkling's  Reduc¬ 
tion  Language. 

Different  I/O-tables  may  exist  for  the  same 
formal  language.  The  next  figure  shows  a  transla¬ 
tion  of  formal  expressions  to  LISP: 


a,  *  CAR  |a,  «  CDR 

a,  ■  A|a*  «  B|a,  *  C 

kx  *  (  D  D  > 

k,  «  (  0  .  Q  ) 

Fig.  S:  I/O-table  for  the  language  given  by  Example 
1  (translation  to  LISP). 

Using  this  table  results  in:  (CAR(CDR(A. (B.C) ) ) ) 

The  editing  system  which  we  are  going  to  develop 
will  only  be  based  on  the  formal  definition  of  ex¬ 
pressions.  The  external  representation  of  an  ex¬ 
pression  is  generated  by  using  an  I/O-table,  which’ 
may  be  a  default  table  supplied  by  the  system,  or  a 
table  defined  by  a  user  who  wants  to  use  his  own 
external  representation  of  a  language. 

The  next  figure  shows  the  relationship  between 
the  different  representations  of  an  expression: 
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Fig.  6:  Representations  of  formal  expressions. 


n-  The-lnteractlve  Editing  system 


1-  An  Expression  Oriented  Editing  System 

Conventional  editors  are  line-oriented,  i.e.  a  line 
is  the  smallest  logical  unit.  Almost  all  commands 
of  such  an  editor  refer  to  lines,  e.g.  move  lines-, 
copy  lines,  insert  lines,  scroll  up  and  down  a  cer- 


TYPE  STACKNAHE  »  <E,A,H,B,U); 


tain  number  of  lines.  Thara  is  no  relationship  to 
the  structure  of  the  program  which  is  edited,  or  to 
the  language  it  is  written  in.  If  a  user  wants  to 
delete  a  begw-end-block,  ha  has  to  find  the  cor¬ 
responding  lines  for  delation.  This  can  be  very 
tedious  if  nested  blocks  are  used  or  the  block  does 
not  fit  onto  the  display. 

We  have  implemented  an  editor  which  is  not  line- 
oriented  but  expression-oriented,  i.e.  the  smallest 
logical  unit  the  user  can  handle  is  a  complete  ex¬ 
pression.  All  commands  of  the  editor  will  refer  to 
expressions,  e.g.  copy  expressions.  Move  ex¬ 
pressions,  delete  expressions,  scroll  to  a  sub¬ 
expression  etc. 


Then  the  atgoritha  TRANSPORT  is  given  by 

PROCEDURE  TRANSPORT <X,Y: STACKNAHE) > 
BEGIN 

CASE  TOP(X)  OF 

ATOM:  MOVE (X,Y) ; 

If -CONSTRUCTOR:  BEGIN 

HOVE(X,N) ; 

FOR  I:-1  TO  N 
DO  TRANSPORT (X,Y)> 
MOVE(M,Y) i 
END 

END 

END 


Since  the  smallest  logical  units  in  our  editing 
system  are  complete  expressions,  ue  first  describe 
some  basic  algorithms  which  allow  us  to  move,  copy 
and  to  delete  expressions. 

The  editor  works  with  five  stacks  which  are 
called  E,  A,  M,  B,  and  U.  Expressions  are  stored 
within  svacks  using  the  representation  described  in 
1.2  (cf.  Figure  J). 


EDITOR  -  logic 


TRANSPORT  is  a  recursive  algoritha:  after  a  con¬ 
structor  has  been  saved  in  the  H-stack  all  its  sub¬ 
expressions  are  moved  to  the  sink-stack,  then  the 
constructor  is  moved  from  the  M-stack  to  the  sink- 
stack. 

Ulpjtg:  During  the  transport  the  subexpressions  of 
constructors  are  interchanged,  e.g.  applying  the 
algorithm  TRANSPORT  to  the  expression  shown  in 

Tigure  J  yields: 

Top  of  stack 

l*k,  |$k,  |tk, |$k, |$a, |$a4 |$a, |$a, (tat 


[stack] 


fig.  7:  Memory  used  by  the  Editor. 


There  are  the  following  primitive  procedures  and 
functions  to  handle  stackelements: 

f'OP(X):  deletes  the  item  on  top  of  stack  X 


Fig.  B:  Result  of  transporting  the  expression  given 

by  Figure  3. 

Applying  the  algorithm  TRANSPORT  repeatedly  to  an 
expression  yields  the  following  transformation: 

TRA.  TRA. 

K(es ,...,e„>  — >  K(e„, ...,ei)  — >  K(e*,...,e„) 

i.e.  an  even  number  of  transports  always  yields  tha 

original  expression. 


Pl'KH(t,X>:  pushes  item  I  into  stack  X 


MOVE (!•  ,Y) :  moves  one  item  from  stack  X  to 

stack  Y 

M0VE2 IX, Y, 2) :  moves  one  item  from  stack  X  to  the 
stacks  Y  and  Z. 

The  functions  and  procedures  listed  above  will  not 
he  explained  any  further  in  this  paper. 


The  algorithm  TRANSP0RT2  moves  an  expression  from 
one  stack  to  two  other  stacks.  It  is  called  by 
TRANSPORT? (X,Y,Z),  where  X,  Y,  and  z  are 
stacknamesi  Z  denotes  the  second  stack  to  which  tha 
expression  is  moved.  The  algoritha  differs  in  only 
one  point  from  the  algorithm  TRANSPORT:  atoms  and 
constructors  are  pushed  into  two  sink-stacks. 


The  algorithm  TRANSPORT  moves  a  complete  expression 
from  one  stack  to  another  stack.  A  third  stack,  the 
control-stack  M,  is  used  for  intermediate  saving  of 
constructors.  A  call  of  the  algorithm  TRANSPORT  is 
denoted  by  TRANSPORT (X,Y)  whare  X  and  Y  are 
stacknames  and  tha  expression  is  moved  from  the 
stack  x  to  the  stack  Y,  i.e.  TRANSPORT <E, A)  moves 
an  expression  from  stack  E  to  stack  A. 

In  the  follouing  wa  will  use  a  PASCAL-like 
language  to  specify  algorithms.  Ue  assume  that 
t'»i*i  e  is  a  global  type-declaration  of  the  stacks: 


PROCEDURE  TRANSPORT? (X,Y,Z: STACKNAHE) ; 
BEGIN 

CASE  TOP(X)  OF 

ATOM:  MOVE? (X,Y,Z) i 
N-CONSTRUCTOR:  BEGIN 

MOVE(X,H); 

FOR  I : «1  TO  N 
00  TRANSP0RT2 (X, Y,Z) ; 
MOVE2 (M,Y,Z)> 

END 

END 

END 


2.3. 


Stack  E 


Stack  B 


Display- 

buffer 


This  algorithm  copies  an  expression  from  one  stack 
to  another  stack  without  interchanging  the  sub¬ 
expressions.  COPY  has  the  same  parameters  as  the 
algorithm  TRANSPORT,  i.e.  COPY(A,E)  copies  an  ex¬ 
pression  from  stack  A  to  stack  E.  nzeO  The 
algorithm  COPY  uses  the  algorithms  TRANSPORT2  and 
TRANSPORT: 

PROCEDURE  COPY(X,Y:STACKNAME) ; 

VAR  21 ,22 :  STACKNAHE; 

BEGIN 

Z1 : *  ...;  ZZ:»  ...; 

TRANSPORTED, 21, 22);  ' 

TRANSPORT  <21, X) ; 

TRANSPORT (Z2,Y); 

END 

The  stacks  21  and  22  are  used  as  scratch  pad  stacks 
for  expressions.  They  must  be  different  from  the 
stacks  X  and  Y. 

2.4.  The  .Algor  1,*hB_B£U;TS 

The  algorithm  DELETE  removes  a  complete  expression 
from  a  stack.  It  has  only  one  parameter  which  is 
the  name  of  the  stack  where  the  expression  is  to  be 
deleted,  i.e.  DELETE (E)  deletes  an  expression  in 
stack  E: 

PROCEDURE  DELETE (X:STACKNAME) ; 

BEGIN 

CASE  TOP (X)  OF 

ATOM:  POP(X); 

N-CONSTRUCTOR:  BEGIN 

POP (X) ; 

FOR  I:»1  TO  N  DO  DELETE (X); 
END 

END 

END 

The  algorithms  described  above  have  been  imple¬ 
mented  by  hardware  in  Berkling's  Reduction  Machine. 
A  description  la  given  in  [KLUGE  79]. 


Fig.  9:  Output  of  an  expression. 

The  algorithm  OUTPUT  is  a  modified  TRANSPORT- 
algorithm.  It  is  defined  by 

PROCEDURE  OUTPUT; 

BEGIN 

CASE  TOP (B)  OF 

ATOM:  DISPLAY; 

N-CONSTRUCTOR:  BEGIN 

DISPLAY; 

FOR  I;«1  TO  N  DO  OUTPUT; 

END 

END; 

ERROR:  BEGIN 

DELETE (B> ;  ABBREVIATE; 

END; 

END 

The  procedure  DISPLAY  pope  one  item  out  of  stack  B, 
retrieves  its  representation  from  the  I/O-table 
(cf.  1.3),  and  replaces  the  associated  placeholder 
within  the  display-buffer  by  the  representation.  , 
Before  the  algorithm  OUTPUT  is  called,  the  display 
buffnr  is  cleared  and  one  placeholder  is  inserted. 
Figure  10  shows  the  different  states  of  the 
algorithm  OUTPUT,  using  the  1/0-table  given  in 
Figure  4  and  the  expression  Ma*  ,kitak,u,  ]]: 


Stack  Display-  Stack  Display 

B  buffer  B  buffer 


Ue  have  already  mentioned  that  the  editor  works 
with  five  stacks  which  are  called  E,  A,  M,  B,  and 
U.  Stack  E  contains  the  expression  which  is  dis¬ 
played  to  the  user.  We  call  this  expression  Focus 
of  Attention  <FA).  Stack  M  is  the  control  stack 
which  is  used  by  the  TRANSPORT-algorlthm.  Stack  B 
is  used  for  input  and  output,  i.e.  input  operations 
move  an  expression  from  the  display  station  to 


Tht  procedure  DISPLAY  fall*  If  there  it  not  enough 
space  within  the  display  buffer  to  insert  the 
representation  of  an  atoa  or  a  constructor.  In  this 
case  the  error  exit  is  taken:  The  corresponding 
subexpression  in  stack  8  is  deleted  and  the 
al-goritha  ABBREVIATE  replaces  the  current 
placeholder  by  an  abbreviation  syabol.  This  naans 
that  the  inneraost  subexpressions  are  autoaatically 
abbreviated  and  the  coaplete  Focus  of  Attention  is 
shown  on  the  display. 

5.  Scrolling  and  Displaying  Selected  Subexpressions 

In  this  chapter  we  will  describe  how  the  user  can 
change  the  Focus  of  Attention  in  order  to  look  ut 
subexpressions  which  have  been  abbreviated.  Lino- 
editors  can  display  hidden  inforaation  by  aeans  of 
scrolling  commands:  Oisplay  previous  page,  display 
next  page,  scroll  up  n  lines  etc.,  i.e.  scrolling 
is  completely  line-oriented.  For  our  purpose  ue 
need  a  scrolling  nechanisa  which  is  expression- 
oriented  since  the  hidden  inforaation  aluays  con¬ 
sists  of  coaplete  subexpressions. 

But  the  problem  is  how  to  select  a  subexpression 
on  the  display  and  how  to  find  the  corresponding 
subexpression  within  the  expression  in  the  E-stack. 
The  solution  we  are  looking  for  should  be  indepen¬ 
dent  of  the  current  I/O-table  that  is  used  for  dis¬ 
playing  expressions;  it  should  only  depend  on  the 
constructor  syntax. 

An  eacy  way  to  select  a  subexpression  is  to  aove 
the  ciruor  to  its  position  on  the  display.  But 
cursor-addresses  are  not  expression  dependent,  they 
are  just  given  by  a  line  and  column  number.  So  we 
have  to  translate  the  cursor-address  into  an 
appropriate  expression-address.  In  our  editing  sys¬ 
tem  these  'appropriate'  addraasas  are  themselves 
expressions  takon  free  a  special  address  language 
LABOR.  The  language  LABOR  is  defined  by  the 
following  constructor-syntax: 

Let  A  ■  (1.2,...)  U  (nil),  i.e.  an  atoe  is 
either  a  natural  number  or  nil,  and  let  K  ■ 
(K2ADDR)  where  K2AD0R  ii  a  two-placa  constructor. 
Then  the  editor  will  use  the  following  expression 
of  IADDR  as  address  of  expressions  or  sub¬ 
expressions: 

1.  The  root  of  an  expression  gets  the  address 
nil 

2.  The  i'th  subexpression  gets  the  address 
K2AOOR[I,AOOR]  where  AOOR  is  the  address  of 
the  current  expression 

In  order  to  make  addresses  more  readable  we  use  the 
following  representation  for  the  constructor 
K2A00R:  K2AOOR[X,Y]  -  X.Y 

The  next  figure  shows  the  expression  of  Figurs 
10,  where  each  subexpression  has  been  marked  with 
ita  address. 

Ki 

(nil) 

/  \ 

•i  it 

fl.nil)  (2. nil) 

/  \ 

•*  >i 

(1.2. nil)  (2. 2. nil) 

Fig.  11:  Expression  and  it*  addresses. 


Bote:  All  addresses  terminate  with  the  apecial  atom 
nil.  Readers  who  are  familiar  with  LISP  probably 
have  noticed  that  express ion -addresses  era 

repraaented  by  a  list  of  integers.  As  expression- 
addresses  are  based  on  a  constructor  syntax,  the 
basic  algorithms  TRANSPORT,  COPY,  ate.  may  also  be 
applied  to  the*.  Besides  these  we  will  need  some 

other  algorithms  to  handle  addresses: 

HEAD(ADQR):  extracts  the  first  nuabar  from  an 

address,  i.e.  HEAD(1.2.S.4.nil)-1 

tail (ADOR):  removes  the  first  number  from  an 

address,  i.e.  TAIL(1.2.3.*.nil)  » 
2. 3. 4. nil 

REVERSE  (ADOR) :  reverses  the  sequence  of  the 

numbers  that  constitute  an 

address, i.e.  REVERSE (1.2. 3. A. ni l) 
-  4. 3. 2.1. nil 

These  algorithms  can  be  expressed  by  using  the 
basic  transport  algorithm  and  the  operations  POP, 
PUSH,  and  MOVE. 

Given  a  reverted  express ion-adek'ess,  we  can 
define  an  algorithm  SCROLLDOUN  which  selects  the 
corresponding  subexpression.  Basically,  SROLLDOUN 
is  a  transport-algorithm  which  moves  an  expression 
from  stack  E  to  stack  A,  but  the  transport  is 
stopped  as  soon  as  the  selected  subexpression  is  on 
top  of  stacli  E: 

PROCEDURE  SCROLLDOUN (ADOR:  EXPRESSION-ADDRESS); 

BEGIN 

IF  NOT (ADOR  »  nil) 

THEN  BEGIN 

M0VE(E,H); 

WHILE  I  <  HEAD (AOOR) 

DO  BEGIN  TRANSPORT (E, A);  It-Iel;  END; 

SCROLLDOUN (TAIL (AOOR) ) ; 

END 

END 

When  the  SCROLLDOUN  algorithm  stops,  the  stacks  A 
and  M  contain  the  environment  of  the  selected  sub¬ 
expression,  Stack  M  contains  all  the  constructors 
which  have  been  encountered  whan  walking  to  ths 
subexpression,  wharaa*  stack  A  contains  all  the 
subexpressions  which  have  been  removed  in  order  to 
get  the  subexpression  on  top  of  stack  E. 

After  having  selected  a  subexpression  and  after 
having  performed  some  actions  on  it  the  user  may 
wont  to  return  to  the  expression  from  where 
scrolling  was  Invoked.  This  is  done  vie  the 
algorithm  SCROLLUP  which  is  the  inverse  of  the 
algorithm  SCROLLDOUN: 

PROCEDURE  SCROLLUP (ADOR:  EXPRESSION-AOORESS); 

BEGIN 

UltILE  I  <  HEAD  (ADOR) 

DO  BEGIN  TRANSPORT (A,E);  I:-I+1;  ENO; 

MOVE (M,E> ; 

SCROLLUP (TAIL (ADOR) )  ; 

END 

Before  the  algorithm  SCROLLUP  is  called  the 
expression-address  is  not  reverted.  SCROLLUP  moves 
the  subnxpreasion*  and  constructor*  having  boon 
moved  to  the  stacks  A  and  M  back  to  stack  E,  thus 


i  “i 


rii. 


reconstructing  the  original  expression  again. 

The  editing  aystea  also  supports  nested 
scrolling:  Whenever  the  algorithm  SCROLLDOWN  is 
called  the  associated  expression-address  is  moved 
to  stack  U.  A  sequence  of  scroll-downs  then  gener¬ 
ates  a  sequence  of  expression  numbers  within  stack 
U.  When  scroll-up  is  requested  the  required 
expression-address  is  found  on  top  ct'  stack  ll  from 
where  It  is  removed. 

Now  there  is  one  problem  left:  Which  expression- 
address  belongs  to  which  cursor-address?  This 
relationship  is  established  via  algorithm  OUTPUT 
which  is  extended  in  the  following  way:  Por  each 
atom  and  for  each  constructor  the  corresponding 
expression-address  generated: 

PROCEDURE  OUTPUT (ADDR:  EXPRESSION-ADDRESS); 

BEGIN 

CASE  TOP(B)  OF 

ATOM:  DISPLAY 
N-CONSTRUCTOR:  BEGIN 

OISPLAY; 

FOR  I:«1  TO  N 
DO  OUTPUT (I. ADDR); 

END 

END 

ERROR:  BEGIN 

DELETE (B); 

ABBREVIATE; 

END 

END 


When  algorithm  OUTPUT  is  called  for  the  first  time 
ADDR  should  be  nil,  i.e.  OUTPUT(nil)  is  a  valid 
call. 

The  procedure  OISPLAY  of  algorithm  OUTPUT  has  to 
be  extended,  too.  First  of  all  we  need  in  addition 
to  the  display  buffer  a  second  buffer  which  we  will 
call  address  table.  The  address  table  has  as  many 
entries  as  characters  can  be  displayed  on  the  dis¬ 
play.  Each  entry  contains  the  address  of  an 
expression-address.  Now,  the  procedure  DISPLAY  will 
update  both,  the  display  buffer  and  the  address 
table:  Whenever  the  representation  of  an  item  is 
moved  to  the  display  buffer,  the  corresponding 
entries  within  the  address  table  will  receive  the 
address  of  the  current  expression-address.  Figure 
12  shows  the  contents  of  the  address  table  for  the 
different  phases  of  algorithm  OUTPUT  for  the  exam¬ 
ple  given  in  Figure  10. 
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where:  1  — >  nil  4  — >  1.2. nil 

2  — >  1  .ni  l  S  — >  2.2.nt  l 

J  — >  2. nil 

Fig.  12:  Contents  of  the  address  table  for  Figure 

10. 

No  entry  means  that  at  the  corresponding  position 
of  the  display  no  expression  is  shown.  The 
algorithm  ABBREVIATE  will  Insert  the  address  of  the 
expression  number  of  the  abbreviated  expression 
into  the  address  table. 

Now  we  are  able  to  translate  a  cursor-address 
into  an  express i on-address i  The  cursor-address 
denotes  an  offset  within  the  address  table,  where 
we  find  the  address  of  the  associated  express l on- 

address  : 


ADDRESS  TABLE 


CURSOR- 

ADDRESS 


— > 


> 


EXPRESSION- 

ADDRESS 


Fig.  IS:  Association  of  cursor-  and  expression- 
address 

The  existence  of  an  address  table  allows  an  ex¬ 
pression  oriented  use  of  some  standard  display- 
station  keys,  e.g.  the  EOF-key  (»  Erase  until  end 
Of  Field)  can  be  changed  to  a  more  useful  EOE-key 
<=  Erase  until  end  Of  Expression).  An  expression  is 
erased  by  erasing  all  screen-positions  whose 
expression-addresses  have  the  same  suffix  as  the 
current  address  of  the  expression. 


6.  Editing:  Update  of  Expressions 


Until  now  we  have  described  the  passive  part  of  the 
editing  system,  e.g.  the  representation  of  ex¬ 
pressions,  how  they  are  displayed  etc.  Now  we  turn 
our  attention  to  the  active  part  of  the  system 
v.nich  allows  Ihe  user  to  edit  (=  update,  delete, 
replace,  etc.)  expressions. 


6.1.  Format  of  the  Screen  Image 

First  of  all  we  have  to  specify  the  screen  image 
used  bv  the  editor.  The  screen  of  a  display  should 
hr  divided  into  four  logif.il  parts  .r.  shown  III 

I  ujuro  Vi. 


EXPRESSION-field 

FA-field 

NESSAGE-f ield 

COMHAND-field 

Fig.  H:  Logical  fit  Id*  UMd  by  the  editing  system 

The  FA-field  i*  the  area  in  which  the  current  Focut 
of  Attention  i*  displayed  via  a Igor  it ha  OUTPUT.  In 
the  COMHAND-field  the  user  way  specify  editor- 
coaaands.  The  MESSAGE-f laid  is  used  to  display 
additional  inforaation  like  error  passages,  ex¬ 
planations  of  the  coaaands  etc. 

The  EXPRESSION-field  is  used  to  tedate  ex¬ 
pressions.  Figure  IS  shows  how  the  four  fields  can 
be  mapped  onto  the  screen  of  a  real  display- 
station.  This  display  image  is  used  by  the  editing 
system  of  Bcrkl tug's  Reduction  Machine. 


DELETE  (E)>  PUSH  <EMPTY-EXPRESSION,E) ; 

The  copy-coaaand  copies  CE  either  to  an  auxiliary 
stack  (X  •  STACKO,  STACKS ,  etc.)  or  into  an  ex¬ 
pression  library  (x  •  naaa  of  an  aiqgression) . 
Copying  is  done  in  the  following  way:  At  first  the 
expression  is  copied  to  the  I/O-stack  B  and  free 
there  it  is  transported  to  the  desired  destination: 

COPY(E,B) ;  TRANSPORT <B,X)i 

A  list  of  the  coeaand*  used  by  the  reduction 
machine  editor  is  given  in  [HOMMES  7*1. 

6.3.  updating  expressions 

Updating  expression*  in  an  expression-oriented 
editor  weans:  replace  a  subexpression  by  wither 
subexpression.  This  is  always  done  in  the  sane  way: 


E  ->  EXPRESSION-field  C  »  I COMMAND- f laid 


MESSAGE- field 


FA-field 


Fig.  IS:  Display  image  used  by  the  editing  system 
of  Berk  ling's  Reduction  Machine. 


1.  The  user  enters  an  expression  Into  the 
EXPRESSION-field  and  positions  the  cursor  to 
the  expression  in  the  FA-field  which  he  wants 
to  replace. 

2.  Via  algorithm  SC ROLL  DOWN  the  expression  which 
ic  going  to  be  replaced  is  brought  to  the  top 
of  stack  E. 

3.  Via  algorithm  INPUT  the  new  expression  is 
generated  from  the  old  expression,  the  pro- 
grae  library,  the  auxiliary  stacks,  and  the 
input  specified  by  the  user. 

*.  The  expression  to  be  replaced  la  deleted. 


6.2.  Ed it or -Commands 

Ua  have  already  mentioned  that  the  Focus  of  Atten¬ 
tion,  i.e.  the  expression  which  resides  on  top  of 
stack  E,  is  displayed  within  the  FA-field  of  the 
screen.  Now  let  us  consider  a  subexpression  of  fa 
which  is  given  by  the  current  cursor-position.  Ue 
will  call  this  expression  the  CURSOR-expression 
(CE)  and  denote  its  address  by  CEADOR. 

All  editor  commands  refer  to  CE,  this  means  that 
CE  must  be  on  top  of  stack  E  when  the  specified 
command  is  going  to  be  executed.  Thus  wo  have  to 
perform  a  SCROLLDOUN  before  and  a  SCROLLUP  after 
the  execution  of  a  command: 

SCROLLDOUN (REVERSE (CEADOR) ) 
execute  specified  monitor  commend 
SCROLLUP (CEADOR) 

In  this  paper  we  will  give  only  the  description  of 
two  basic  editing  commands: 

0:  Oeleta  the  CURSOR-expression 
Cx:  Copy  the  CURSOR-expression 

The  D-comwand  replaces  CE  by  a  special  atom  called 
the  EMPTY-expression  which  prompts  the  user  to  en¬ 
ter  a  new  expression.  This  ensures  that  a  user  can 
never  generate  incoaplete  expressions.  Uhenever  he 
deletes  an  expression  he  has  to  replace  it  by 
another  expression.  The  a Igor  it  ha  for  the  D -command 

IS! 


S.  The  new  expression  is  moved  from  stack  B  to 
stack  E.  A  scroll-up  operation  is  performed 
to  return  to  the  previous  FA,  which  now  in¬ 
cludes  the  replaced  subexpression. 

Figure  16  shows  the  contents  of  the  stacks  during 
the  different  pheses  of  replacement: 
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Fig.  16:  Contents  of  stacks  when  replacing  vi  ex¬ 
pression. 
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outline  of  procedure  REPLACE: 

PROCEDURE  REPLACE (ADOR:  EXPRESS ION-AOORESS) ; 

BEGIN 

SCROLLDOUN (REVERSE (ADOR) ) ; 

INPUT; 

DELETE (E); 

TRANSPORT (B, A); 

TRANSPORT (A, E) ; 

SCROLLUP (ADOR) ; 

END 

The  algorithm  INPUT  perform*  the  following  opera¬ 
tions: 

1.  The  expression  specified  by  the  user  in  the 
expression  -field  is  translated  from  its  ex¬ 
ternal  representation  to  the  associated  in¬ 
ternal  representation  by  using  the  I/O-table. 

2.  EHPTV-expressions  are  Inserted  for  missing 
subexpressions,  i.e.  the  expression  entered 
by  the  user  is  automatically  completed. 

3.  References  to  other  expressions  are  resolved. 

4.  When  INPUT  terminates,  a  complete  expression 
has  been  generated  within  stack  B. 

There  are  three  references  to  other  expressions 
which  may  be  used  when  constructing  new  ex¬ 
pressions: 

express 1 on-address: 


always  denoted  by  an  expression  having  the 
following  format: 


ap  f  ...  t„  or  /  |  V 

*  •>  •  a„ 

i.e.  a  special  constructor  cal lad  applicator 
followed  by  a  function  and  its  arguments. 

Progress  written  in  an  applicative  language  are 
executed  by  resolving  applications,  i.e.  by 
applying  functions  to  its  arguments,  which  is  done 
according  to  e  set  of  rewriting  rules.  A  rewriting 
rule  specifies  the  expression  by  which  an  applica¬ 
tion  is  to  be  replaced. 

Example:  The  rewriting  rule  for  the  identity  func¬ 
tion  is  given  by 

ap 

/  \  — >  e 

id  e 

An  algorithm  which  resolves  applications  can  be 
based  on  a  TRANSPORT-elgorithm.  The  idea  is  to  move 
an  expression  from  stack  E  to  stack  A,  but  to  stop 
the  transport  when  the  following  situation  occurs: 
the  applicator  is  on  top  of  stack  M,  the  function 
is  on  top  of  stack  A  and  the  arguments  are  on  top 
of  stack  E.  Then  the  application  is  resolved  ac¬ 
cording  to  the  rewrite  rule,  i.e.  the  applicator  is 
popped  ouf  of  stack  M,  the  function  is  removed  from 
stack  A,  and  the  arguments  on  top  of  stack  E  a^e 
replaced  by  the  result. 


M 


An  expression-address  is  replaced  by  its  cor¬ 
responding  expression,  i.e.  the  expression- 
address  l.nil  may  be  used  to  refer  to  the  first 
subexpression  of  the  expression  which  is  going 
to  be  replaced. 

Example:  Entering  l.nil  will  replace  an  ex¬ 
pression  by  its  first  subexpression 

name  of  an  auxiliary  stack  or  of  an  expression: 

The  name  is  replaced  by  a  copy  of  the  expression 
which  is  either  on  top  of  an  auxiliary  stack  or 
in  the  expression  library.  This  reference  is 
used  to  retrieve  expressions  which  are  moved  to 
an  auxiliary  stack  or  to  the  library  by  using 
the  copy-command. 

Note:  Expression  references  ere  resolved  by 
applying  the  basic  COPY-algor (thm. 

This  chapter  hat  shown  the  basic  features  of  an 
expression-oriented  editing-system;  [HONHES  79] 
gives  more  information  and  shows  especially  how  the 
user  can  construct  programs  in  such  an  expression- 
oriented  system. 

7.  Evaluation  of  Programs 

The  editing  system  described  so  far  works  for 
arbitrary  languages  based  on  a  constructor  syntax. 
Now  we  are  going  to  restrict  this  class  of 
languages  to  applicative  languages.  These  are 
languages  in-which  an  application  of  e  function  is 


Stack  Stack  Stack 

e  a  n 


Stack  Stack  Stack  . 
E  A  M 


Fig.  17:  Re so lying  an  application 


Having  resdlved  the  applicetion  the  TRANSPORT 
algorithm' is  activated  again.  Uhen  the  expression 
has  been  moved  to  stack  A  all  application*  have 
been  resolved.  The  algorithm  TRANSPORT (A,E)  moves 
the  expression  back  to  stack  E. 

The  editor  can  be  easily  extended  to  allow  in¬ 
teractive  execution  of  expressions  or  sub¬ 
expressions.  Introducing  the  editor-command  E 
(=<Evaluate)  into  the  environment  described  in 
II. 6. 2.  wi ll  result  in: 


i 


<k_4 

1 

1 


SCROLLDOUN (REVERSE (CEADOR) ; 

EVALUATE; 

TRANSPORT (A,E) ; 

SC  ROLLUP  (CEADOR); 

By  using  the  cursor  any  subexpression  may  be 
selected  for  evaluation. 
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Abstract 

By  uging  u  functional  programmin'', 
system  as  a  machine  language,  a  highly 
parallel  computer  can  be  cons  true  led. 

A  form  of  lazy  evaluation,  using  in¬ 
complete  objects,  provides  a  mechanism 
for  constructing  a  data  flow  computer 
which  directly  executes  programs  wtillen 
using  the  functional  program  system  in 
a  highly  parallel  manner.  Since  a  data 
flow  architecture  Is  used,  this  paral¬ 
lelism  Is  not  dependent  on  any  specialized 
parallel  language  or  compiler.  This 
computer  consists  of  three  basic  components: 
a  set  of  processors,  a  shared  memory 
containing  only  FP  objects,  and  a  queue 
feeding  functions  to  all  processors.  The 
design  i«  modular,  jllowlng  an  arhllrurv 
number  t  processors,  which  need  not  he 
identical. 


INTRODUCTION 


forms,  a  set  of  definitions,  and  the  operation  of 
application.  Formal  systems  for  functional  pro¬ 
gramming  (FFP  systems)  use  objects  to  represent 
FP  functions. 

An  object  is  either  an  atom,  a  sequence  whose 
elements  are  objects,  or  1  ("bottom"  or  "undcf  lned"). 
Atoms  Include  numbers  and  Identifiers.  FP  systems 
whose  sequence  constructor  Is  1  preserving  will 
never  allow  1  to  be  an  element  of  u  sequence. 

Only  in  an  FP  system  whose  sequence  constructor  is 
not  1  preserving  could  the  sequence  <X,i>  be  found. 
The  special  atom  Is  used  to  denote  the  empty 
sequence,  „Mch  is  both  an  atom  and  a  sequence. 
Sequences  -J  be  represented  by  enclosing  the 
sequence  elegants  in  <  and  >.  The  application 
operation  Is  denoted  by  a  so  the  application  of 
the  function  f  to  the  object  x  would  be  written  as 
f :  x. 

All  functions  are  applied  to  11  single  object. 
Since  all  functions  have  only  one  argument,  it  Is 
unnecessary  to  give  names  to  arguments.  Because 
all  programs  are  composed  only  of  such  functions 
all  variable  names  are  completely  eliminated. 
Functions  which  would  normally  require  more  than 
one  argument  are  applied  to  a  sequence  containing 
all  of  the  needed  arguments.  A  brief  list  of  the 
primitive  functions  to  be  used  follows. 
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A  new  approach  to  data  flow  computers  Is 
suggested  by  functional  programming  (FP)  systems, 
as  described  by  Backus  |1).  By  introducing  a  form 
of  "lazy  evaluation",  similar  to  that  used  by 
Friedman  and  Wise  [3]  In  a  computer  whose  machine 
language  la  an  FP  system,  a  simple  yet  powerful 
data  flow  computer  resuite. 

Unlike  other  parallel  computers,  data  flow 
processor*  (2,4, 5, 6]  obtain  parallelism  directly 
from  lta  source:  the  natural  data  dependencies 
between  operations  in  a  program.  Such  computers 
are  not  bound  to  parallel  languages  or  compilers, 
but  are  able  to  introduce  parallelism  into  all 
programs  without  need  of  assistance  above  the 
hardware  level. 

FUNCTIONAL  PROGRAMING  SYSTEMS 

This  section  will  serve  as  a  .'-fresher  on  FP 
systems  and  as  a  reference  for:  later  discussion  of 
FP  systems.  Only  those  aspects  of  FP  systems 
relevant  to  computer  design  will  be  reviewed.  A 
complete  description  of  the  FP  system  used  here 
can  be  found  in  Backus  (1). 

An  FP  system  ia  described  by  five  things:  a 
set  of  primitive  functions,  a  Bet  of  functional 


>t:x 

1 1 :  x 

id:x 

atom:  x 

eq:<x,y> 

null:* 

reversesx 

dlstr:'s.x> 

dlst 1 ; ix, bz 

lengthix 


Where  n  is  an  Integer.  Find  the 
nth  element  of  the  sequence  x. 

Remove  the  first  element  of  the 
sequence  X. 

The  identity  function.  Return  x 
unchanged. 

TeBts  If  x  Is  an  atom.  T  is  return¬ 
ed  for  true,  F  for  false. 

Tests  if  x  and  y  are  equal  objects. 

Tests  if  x  is  4>. 

Reverse  the  elements  of  the  sequence 
x. 

Create  a  sequence  of  pairs  formed  by 
pairing  each  element  of  s  with  x, 
<sltx>. 

Like  distr,  except  the  pairs  will 
have  x  for  the  first  element, 

<x,s1>. 

Find  the  length  of  a  sequence. 
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+:<x,y>  Add  x  and  y.  (-,  x,  and  :  ar»> 

similar.) 

and:<x,y>  And  the  boo  leans  x  and  y.  (Or  and 

not  functions  are  similar.) 

irans-.x  Transpose  x,  where  x  is  a  sequence 

of  sequences  ldentlrsl  In  length. 

apndl:<x,seq>  Append  x  to  the  left  ond  of  seq. 
apndr : <seq, x>  Append  x  to  the  right  end  of  seq. 

apply:<f,x>  Apply  the  function  f  to  the  object 
x . 

A  i  is  produced  whenever  a  function  in 
applied  to  an  improperly  formed  object,  such  as 
applying  a  selector  to  an  atom  or  using  a  sequence 
in  place  of  a  number  for  an  arithmetic  operation. 

All  functions  are  i  preserving,  returning  1  when 
applied  to  j,.  (But  tee  the  discussion  later  of 
non  i  preserving  functions). 

Functional  forms  are  functions  which  use 
other  functions  or  objects  as  parameters.  Forms 
are  used  to  create  expressions  involving  functions 
The  functional  forms  to  be  used  are: 

f*g:x  Compose  f  and  g.  Returns 

f : (g:x). 

. . fn):x  Construct  a  sequence  whose  1th 

elekient  Is  fjix, 

(p^flgjix  If  p:x  Is  T  return  fix,  other¬ 

wise  If  p:x  is  F  then  return 
g:x. 

y:x  Return  y,  a  constant. 

/f:x  Insert  e  binary  function  Into 

a  sequence. 

/fi<x1>  5  x1i/f:<x1,...,xn> 

=  f:<xi,/f«x2,...lxn». 

af:<x1,...,xn>  Apply  a  function  to  all  ele¬ 
ments  of  a  sequence. 

(while  p  f):x  While  p:x  ■  T  apply  f  to  x 

The  state  D  contains  all  functions  defined  by 
the  user.  A  function  definition  aasoclatee  a 
name  (an  atom)  with  a  function.  Definitions  are 
denoted  by  def  name  =  function.  All  function 
names  must  be  either  defined  In  D  or  known  by 
the  system  as  primitive  functions  or  forms.  Since 
D  never  changes  during  the  execution  of  a  program, 
the  set  of  functions  defined  for  a  particular 
program  la  static. 

THE  INHERENT  PARALLELISM  IN 
AH  FT  SYSTEM 

The  FF  forma  which  directly  Imply  paral¬ 
lelism  are:  apply- to-all  (a),  insert  (/),  and 
construction  ((...)).  Apply-to-all  creates  a 
sequence  by  applying  the  same  function  to  a  var¬ 
iety  of  objects,  while  construction  creates  a 
sequence  by  applying  a  variety  of  functions  to  the 
same  object.  Within  these  forms  function  eval¬ 
uations  can  proceed  In  parallel,  due  to  the 
absence  of  side  effects. 


The  insert  form  computes  a  single  result  ab¬ 
sorbing  each  element  of  a  sequence  into  a  dyadic 
operator.  If  the  operation  being  Inserted  Is 
associative  (i.o..  If  f :<A, (f :<B,C>)  “f :<(f :<A,B>), 
C>  for  all  objects  A,B  and  C)  then  this  form  can 
be  highly  parallel.  An  associative  Insert  can 
"tree  In"  the  sequence  rather  than  proceeding 
serially  through  the  sequence.  Associative  func¬ 
tions  would  be  recognized  before  program  execution 
and  two  different  Insert  forms  would  be  usod: 

Insert  and  Insert-associative. 

An  interesting  property  of  these  forms  Is 
that  if  a  parallel  construction  form  Is  Implement¬ 
ed,  then  parallel  versions  of  the  Insert-associa¬ 
tive  and  apply-to-all  forms  can  be  expressed  using 
parallel  construction.  Assuming  that  the  function 
is  being  applied  to  the  pair  <functlon,  obJect> 

(the  result  of  (2*1, 2]  in  en  FFT  system),  then 
suitable  definitions  for  the  apply-to-all  and 
Insert-associative  functions  are: 

def  APPLYTOALL  =  null»8-*4>; 

apndl* (apply »[!,  2*2) , 

APPLYTOALL* (I , tl*2 ] ] 

def  INSERTASSOC  5  aq* [l«ngth*2,I)-*l *2j 

INS  ERT ASSOC • REDUCEPAXRS 

def  REDUCEPAXRS  =  lsq*llength*2,I]-*ld; 

(l.apndl*  (apply* 

2*REDUCF.PAIRS»  [1  ,tl*tl*2]  ]  ] 

The  function  name  leq  is  used  for  s  laaa- 
than-or-equal-to  function.  For  the  apply-to- 
all  function,  if  both  the  apply  and  APPLYTOALL 
arguments  to  the  apndl  function  are  evaluated  In 
parallel,  then  eventually  aach  application  will 
be  running  In  parallel.  In  the  caae  of  INSEKT- 
ASSOC,  the  function  REDUCEPAIRS  will  apply  the 
function  being  Inserted  to  auccaaelve  pslra  In  the 
sequence,  halving  tha  length  of  the  sequence. 

This  will  be  done  in  parallel,  aa  with  AFPLYTOALL. 
Tha  INSERTASSOC  function  iteretlvaly  calls 
REDUCEPAIRS,  which  tree*  In  tha  ssqusnca  one  level, 
until  the  final  raault  (the  top  of  the  tree)  la 
reached. 

PARALLELISM  IN  COMPOSITION 

Introducing  parallelism  Into  the  composition 
form  la  more  difficult.  The  nature  of  composition 
would  seem  to  prohibit  any  aort  of  parallailam 
due  to  the  Inherent  data  dependency  between  the 
functions  being  composed.  If  It  le  required  that 
the  data  transferred  between  tha  functions  la  an 
object  In  the  usual  aensa,  than  psralimllsm  Is  in 
fact  Impossible.  If,  however,  s  function  is  able 
to  form  partial  results,  then  these  results  can  b# 
passed  between  the  functions  allowing  soma  degree’ 
of  overlap.  These  partial  results  arise  from  the 
ability  to  decompose  (or  factor)  many  functions. 

To  express  partial  results  Incomplete  objects 
will  be  Introduced.  An  Incomplete  object  is  an 
object  containing  portions  which  have  yet  to  be 
determined,  but  which  eventually  will  be  filled 
in.  The  FF  system  requires  only  one  now  "object" 
to  express  these  incomplete  objects,  tbs  Incomplete 
atom  w.  w  will  strva  as  the  fundamental  unit  of 
Incompleteness,  capable  of  assuming  any  value  on 
completion.  An  w  can  be  thought  of  as  s  place- 


holder,  representing  the  result  of  en  erbltrery 
function  which  hes  not  yet  been  finished. 

Every  u  will  be  eeeocleted  with  s  completion 
function.  This  completion  function  will  eventuel- 
ly  specify  e  velue  to  be  used  In  piece  of  the  w. 
Formally,  eny  u>  should  be  Identified  by  its  com¬ 
pletion  function.  A  more  casual  notation,  in 
which  ui's  with  different  completion  functions 
will  be  given  different  subscripts,  will  be  used 
herein.  Of  course,  there  mey  be  many  references 
to  the  result  of  a  single  completion  function. 

When  u)  is  used  as  a  sequence  in  an  eppend 
function,  a  new  sort  of  Incomplete  object  la 
created.  If  apndl:<X,uii>  Is  evaluated,  the 
result  will  be  denoted  by  <X,ft,>.  ft  is  called 
the  Incomplete  subsequence,  ana  Is  used  to  Indi¬ 
cate  a  section  of  a  sequence,  of  arbitrary  length, 
which  hes  not  yet  been  filled  in.  In  this  example, 
and  fti  have  the  same  completion  function,  yet 
the  result  of  the  completion  function  will  be  In¬ 
stalled  within  a  sequence  In  the  case  of  fl^.  For 
example,  If  ft,  (and  103)  complete  to  <Y,Z>,  the 
sequence  will  now  be  <X,Y,Z>,  not  <X,<Y,Z». 

All  11' s  will  be  found  within  a  sequence.  Any 
time  that  an  (1  completes  to  a  non-sequence,  an 
error  (j)  will  result.  An  (1  la  not  a  ssptrate 
Incomplete  atom,  but  rather  e  different  usage  of 
the  basic  Incomplete  atom  u.  Any  will  be 
dependant  on  some  b)^  for  its  completion  function. 

If  ui  appears  within  a  sequence,  it  represents  a 
particular  element  of  the  sequence  whose  velue  is 
as  yet  unknown,  but  If  n  appears  In  a  sequence,  It 
represents  a  portion  of  the  sequence  Itself  which 
is  unknown.  Any  sequence  containing  ft  will  be 
termed  an  Incomplete  sequence;  any  object  contain¬ 
ing  either  u  or  ft  will  be  termed  an  Incomplete 
object. 

Conceptually,  an  Incomplete  object  Is  a  set 
of  objects.  This  set  contains  all  possible  values 
the  Incomplete  object  may  assume  on  completion. 

For  example,  (i)  would  be  the  set  of  all  objects, 

<ft>  would  be  the  set  of  all  sequences  (including 
4>) ,  <U)^,W2>  would  be  the  set  of  all  sequences 
of  length  2,  and  so  on.  A  partial  ordering  of 
Incomplete  objects  can  be  constructed  using  tho 
containment  relation  between  their  associated 
sets.  An  Incomplete  object,  X,  Is  more  complete 
than  another  Incomplete  object,  Y,  If  the  set  of 
objects  associated  with  X  Is  a  proper  subset  of 
the  set  associated  with  Y.  A  complete  object  is 
one  whose  set  contains  only  one  member,  the  object 
Itself . 

When  a  function  is  applied  to  an  Incomplete 
object,  four  different  situations  may  arise: 

1.  The  object  Is  not  sufficiently  complete 
for  the  function  to  have  any  effect.  In  this 
case,  the  function  must  be  deferred  until  the 
object  becomes  more  complete. 

2.  The  function  can  be  applied  to  portions 
of  the  object,  but  must  defer  applying  Itself  to 
other  sections  of  the  object. 

3.  The  function  can  be  applied  to  the  object, 
but  the  result  is  still  Incomplete. 

4.  The  function  can  be  applied  to  the  object 
and  the  result  Is  a  complete  object. 

A  few  Illustrations  of  thsee  cases  are: 

1.  +:<3,ui.>  cannot  be  evaluated  (at  this 

instant) . 


2.  Reverse: <A, B.ft^ • D. E*  -  <E,D, (reverse: 
<ft)>)»  B,A>  •  «E,D,ft2,B,A>,  where  a  new  e)2  has 
been  created  to  hold  the  result  of  (reverse: 

<«!>). 

3.  3:<A,B,uj1>  »  Trans :<<ojj,ui2>, 

<a)3,U^>>  ■  <<oi^,W3>,<U)2,M/(». 

A.  3:<ai1,B,C>  •  C.  Length :<toi,ii)2>  ■  2. 

A  rather  subtle  problem  has  arisen  here.  By 
postponing  the  completion  of  a  sequence,  the  [ 
preserving  nature  of  the  sequence  constructor  has 
been  lost.  For  example.  If  2:<A,u1>  Is  evaluated 
to  A,  this  result  becomas  incorrect  if  uj  Is 
completed  by  1  and  the  sequence  constructor  Is  1 
prsservlng.  Thus,  It  Is  natural  for  an  FP  system 
which  uses  lncomplate  objects  to  have  a  sequence 
constructor  which  Is  not  _[  preserving,  prevent- 
jng  entire  sequences  from  bexng  later  replaced  by 

(To  further  allow  parallelism,  it  would  be 
osslble  to  produce  other  functions  which  are  not 
preserving.  An  example  of  such  a  function  would 
be  the  and  function.  If  and  Is  defined  so  that 
its  result  Is  F  (false)  if  either  element  of  the 
pair  It  is  applied  to  Is  F,  then  and:<F,w.>  could 
be  immediately  evaluated  to  K.) 

Incomplete  objects  are  closely  related  to 
the  suspensions  produced  In  lazy  evaluation  [3j. 
One.  difference  is  that  incomplete  objects  Imply 
concurrent  function  evaluation  while  suspensions 
imply  delayed  function  evaluation,  Another  is 
that  conceptually,  Incomplete  objects  slay  within 
the  realm  of  objects  (with  only  u>  added),  while 
suspensions  are  used  transparently.  The  real 
advantage  in  using  incomplete  objects  rather  than 
suspensions  lies  in  the  clean  notation  of  Incom¬ 
plete  objects  and  the  ability  to  stay  within  the 
set  of  objects. 

THE  DESIGN  OF  AN  FP  COMPUTER 

The  design  goals  of  the  FP  computer  will  he: 

1.  The  computer  will  use  an  FP  system  as  a 
machine  language. 

2.  The  memory  will  be  used  only  for  FP 
objects. 

3.  The  computer  will  be  data-driven; 
parallelism  will  result  naturally  from  data 
dependencies. 

4.  The  computer  will  be  modular,  allowing 
great  expansion  without  any  change  in  the  basic 
architecture. 

Goal  1  provides  a  computer  which  will  enforce 
a  disciplined  use  of  the  memory  at  the  hardware 
level,  preventing  destructive  updating  and  side 
effects.  Goal  2  allows  the  memory  to  be  homogen¬ 
eous.  Since  only  objects  are  being  stored,  the 
memory  is  not  forced  into  the  conventional  work 
and  address  structure.  Goal  3  attempts  to  produce 
an  ideal  data  flow  computer  by  putting  the  burden 
of  parallelism  onto  the  hardvard.  Goal  4  states 
that  the  design  should  be  expandable,  allowing 
great  increases  in  computing  power  without  chang¬ 
ing  the  underlying  architecture. 

Incomplete  objects  will  be  used  to  produce  the 
necessary  parallelism.  Two  basic  principles  will 
govern  the  use  of  Incomplete  objects.  First,  all 
functions  will  be  completion  functions.  This 
associates  each  function  with  a  place  (an 


192 


Incomplete  atom)  for  its  result.  The  eecond  prin¬ 
ciple  le  that  incomplete  atoms  will  be  generated 
by  the  function  apply.  Thin  includes  the  use  of 
apply  in  moat  functional  forma.  For  example, 
f*g:x  mould  be  treated  as  f:(g:x),  so  that  two 
incomplete  atoms  would  be  used,  one  for  the  result 
uf  g:x  and  the  other  for  the  reault  of  f:(g:x). 

The  FP  computer  will  have  three  basic 
coaqxmants:  A  set  of  processors,  a  memory,  and  a 
READY  queue.  The  processors  apply  functions  to 
objects,  the  memory  holds  these  objects,  and  the 
READY  queue  feeds  functions  to  the  processors. 

The  READY  queue  functions  as  a  "shared  pro¬ 
gram  counter".  All  functions  evaluated  by  the 
procesaors  must  flow  through  the  READY  queue. 
Whenever  a  function  is  ready  to  be  executed,  it  is 
placed  into  the  READY  quaque.  A  queque  element 
(instruction)  has  four  components.  The  format  of 
a  queue  element  is: 

< function,  object,  ur,,u^t*D> 

The  function  and  object  describe  an  application 
to  be  performed,  uraiult  indicates  the  atom  taring 

completed,  and  D  is  the  state  of  the  program.  D 
will  be  constant  for  all  queue  elements  of  «i  sin¬ 
gle  progr'i.  In  a  multiple  program  environment, 
dlffereu4.  igrama  could  be  distinguished  by  their 
different  D  s. 

The  memory  contains  only  objects.  Objects 
Include  queue  elements,  D's,  functions,  and  incom¬ 
plete  atoms.  The  memory  must  be  managed,  allowing 
new  objects  to  be  created  and  reanvlng  objecta 
which  tave  become  garbage.  When  an  incomplete 
atom  le  identified  aa  garbage,  ita  completion 
function  must  be  terminated.  Since  it  la  impor¬ 
tant  to  remove  these  garbage  functions  aa  soon  as 
possible,  garbage  should  be  idontlfled  lamedlately 
when  produced. 

All  Incomplete  atoms  will  have  an  attached 
queue,  similar  to  the  READY  queue.  These  queues 
will  contain  functions  which  are  blocked  by  an 
input  which  is  not  sufficiently  complete.  When¬ 
ever  a  function  cannot  evaluate,  it  atteches  it¬ 
self  to  an  incomplete  atom  blocking  it.  When  an 
incomplete  atom  is  completed  (actually,  it  still 
can  be  replaced  by  an  Incomplete  object,  but  it 
will  always  become  more  complete),  Its  queue  ts 
ntluched  to  the  READY  queue. 

The  processors  take  queue  elements  from  the 
READY  queue  and  execute  them.  Figure  1  gives  a 
simplified  flowchart  of  processor  operation.  Three 
distinct  paths  exist  through  this  flowchart;  one 
for  garbage  functions,  one  for  functions  blocked 
by  Incomplete  objects,  and  one  for  functions  which 
arc  executed.  Processors  have  three  sorts  of 
functions  to  deal  with:  built  in  functions,  de¬ 
fined  functions,  and  forma.  Built  in  functions 
have  some  standard  representation  recognized  by 
the  processors;  defined  functions  are  fetched 
from  the  state.  D;  and  forma  are  handled  through 
the  raetacompositton  rule.  All  Inter-processor 
(otnnunlratlon  Is  handled  by  the  READY  queue  and 
raemoiv.  No  special  inter-processor  communication 
hardware  is  required.  Also,  no  processor  has 
any  slate  saved  between  instructions. 


MULTIPLE  PROCESSOR  TYPES 

The  architecture  can  ba  expanded  to  accomo¬ 
date  different  types  of  processors.  The  only 
addition  needed  is  a  READY  queue  for  each  processor 
type.  Whan  a  queue  element  is  ready  for  execution, 
it  is  placed  Into  the  READY  queue  corresponding 
to  the  function  within  the  queue  element.  This 
allows  a  system  to  use  a  smaller  number  of  proces¬ 
sors  for  functions  which  are  costly  to  implement 
or  Infrequently  used.  Also,  a  high  speed  arith¬ 
metic  processor  would  not  be  tied  up  executing  non- 
arlthnetlc  functlona. 

One  vary  useful  processor  type  would  be  e 
processor  which  only  checks  for  executable  func¬ 
tions  (functions  whose  object  la  sufficiently 
complete  to  allow  execution  of  the  function).  Thla 
very  simple  processor  would  remove  this  burdsn 
from  processors  with  computing  abilities. 

COMPARISON  WITH  OTHER  DATA  FLOW  COMPUTERS 

A  broad  daflnltion  of  a  data  flow  procsssor 
(6}  is  ona  In  which  the  execution  sequence  Is  con¬ 
trolled  by  data  dependencies.  Many  data  flow 
computers  raquira  that  a  model  for  the  partial 
ordering  of  the  execution  sequence  be  constructed 
before  execution,  at  a  time  whan  data  dependencies 
cannot  be  completely  located.  The  FT  computer, 
however,  needs  no  such  nodal  since  data  dependen¬ 
cies  are  manifested  curing  program  execution. 
Furthermore,  the  FP  computer  allows  specialised 
processors  and  progrma  control  la  not  dlrectsd  from 
a  single  master  processor. 

The  use  of  s  FP  system  for  a  machine  language 
Induces  single  assignment  behavior  [S],  which  Is 
also  found  In  purs  L18P  [3,4) .  FP  systems  provide 
a  more  practical  machine  language  then  LISP  [3,4] 
since  FP  systsna  do  not  use  a  changing  environment 
or  variable  names.  Selectors  are  much  more  suit¬ 
able  for  accessing  values  at  the  machine  level  than 
names . 

The  placing  of  a  queue  element  into  the  READY 
queue  corresponds  to  firing  [Z]  but  the  FP  computer 
does  not  know  if  the  queue  element  is  actually 
ready  for  execution.  An  FP  operation  may  "fire" 
several  times,  each  time  waiting  for  a  more  com¬ 
plete  input,  until  the  operation  is  finally  per¬ 
formed. 

The  overhead  involved  with  parallelism  lies 
In  encountering  functions  which  are  found  to  be 
unexecutable  due  to  an  insufficiently  complete 
object.  This  overhead  is  usually  limited  for  a 
particular  function,  since  only  a  limited  number 
of  stageB  of  completion  are  possible  for  objects. 
For  example,  the  +  function  normally  will  see  a 
maximum  of  only  3  stages  of  completion  of  its 
argument , 


such  as  <lll1>ll)2>,<'llj.n2>,<nj, 


V 


IMPLEMENTATION  OF  THE  FP  COMPUTER 

Tills  section  outlines  those  features  of  the  FP 
computer  which  relate  to  parallel  processing. 

The  Functional  Forms 

Composition;  Composition  uses  an  Incomplete 
atom  tu  link  the  functions  being  composed.  Whan 

<f«g,x,u)  ,  _  D>  is  executed,  a  new  Incomplete 

"  result,  ’ 

atom,  '.I  ,  is  created.  The  function  g  is  started 

temp 
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bi.«&$L.vA; 


by  placing  <8«x«wtaBp.D>  in  the  READY  queue.  The 

function  £  is  placed  in  the  queue  attached  tu 

w  ,  in  the  fora  of  the  queue  element  'f,u  , 
temp  ’  temp 

10result'^>‘  *oon  99  8  pruJucea  its  first  par¬ 
tial  result,  f  will  attempt  tu  proceed. 

Construction:  Construction  forms  a  sequence 


of  Inccmplete  atoms. 


When  <tfj,  ...  , f n ] 


o  .  ,D>  is  executed,  a  result  ,  <ui,  ...  ,ui  >, 
result  1  n 

is  Immediately  formed.  Also,  for  each  f.  the 
queue  element  <f,,x,u, ,D>  is  added  to  the 
READY  queue.  1 

Apply-to-alli  The  only  difference  between 
construction  and  apply-to-all  is  that  apply-to- 
all  may  be  applied  to  an  incomplete  sequence.  If 


<af,<x^,  ...  ,flt . 


,x  >,u>  ,  ,  D>  is  executed, 

n  result 


the  result  will  be  <a), ,  ...  ...  is  >.  Thu 

1  i  n 

queue  element  <af will  be  attached  to 
the  queue  of  to^.  Otherwise,  all  other  functions 
will  be  attached  to  the  READY  queue  as  with  the 
construction  form. 

Inaert-aasociative:  When  applied  to  a  com¬ 
plete  sequence,  insert-associative  can  he  imple¬ 
mented  in  terms  of  other  forms.  When  applied  to 
an  incomplete  sequence,  this  form  Is  similar  to 
apply-to-all. 

Condition!  There  are  two  ways  to  implement. 
Che  conditional  form,  (p-°f;g)  !  parallel  and  non¬ 
parallel,  Both  would  have  the  same  semantics,  but 
a  parallel  conditional  would  evaluate  p,  f,  and 
g  In  parallel.  This  is  not  always  desirable, 
since  considerable  processing  might  be  wasted 
evaluating  the  alternative  which  will  not  be  cho¬ 
sen.  This  is  a  raal  problem  In  loops  closed  by  a 
conditional,  since  a  parallel  condition  form  would 
look  ahead  beyond  the  end  of  the  loop.  Other 
times,  however,  parallel  evaluation  of  p,  f,  and 
8  will  speed  up  execution. 

For  the  non-parallel  condition,  evaluation 
<(p>f;g),x,lllre8ulr,D!*  will  create  a  new  functional 


form,  choose.  <(choose  f  g  x) ,  u)  ,od  ,  ')> 

temp  result, 

will  be  placed  on  the  queue  of  oj 


will  be  placed  on  the  queue  of  oj  and  <p,x,u> 

temp  ‘ 

temp'^”  *>e  pluced  on  the  READY  queue.  Once 

p  returns  a  value,  the  choose  form  will  he  acti¬ 
vated,  which  will  select  either  f:x  or  g:x  as  a 
result . 

The  parallel  conditional  can  be  expressed  in 
terms  of  construction  and  a  new  primitive  function, 
cottd.  A  parallel  (p+f;g)  would  be  expressed  hy 
cond» [p, f ,g] ,  where  cond  behoves  like  (l-*2,  ,i). 

Tile  parallelism  results  from  the  parallel  function 
evaluation  used  by  construction.  When  p  returns 
a  value,  the  unused  function,  f  or  g,  will  become 
garbage  and  terminate. 

Primitive  Functions;  Different  primitive 
functions  require  various  degrees  of  completeness 
before  being  executed.  A  few  examples  are: 

+  requires  a  complete  object, 
length  requlrea  a  complete  sequence. 

requires  a  sequence  whose  firm  ’3  elements 
are  not  fi' s. 

id  permits  any  Incomplete  object. 


The  only  other  aspect  of  primitive  functions 
related  to  parallelism  is  the  ability  of  some 
functions  to  decompose  themselves  when  applied  to 
incomplete  sequences  (see  "reverse"). 

1’n u'essor  Synchro 1 1 lzu t  i on 

Only  two  operation*  require  synchronization 
of  the  processors.  First,  requests  for  new 
objects  must  be  synchronized.  This  can  be  ac¬ 
complished  by  various  techniques,  depending  on  the 
exact  memory  organization.  The  simplest  would  use 
a  conventional  free  list  protected  from  multiple 
accesses  with  a  semaphore.  An  "Intelligent  memory 
might  be  able  to  handle  multiple  memory  requests 
Internally. 

The  other  need  for  synchronization  lies  in 
the  only  object  which  can  be  updated:  the  incom¬ 
plete  atom.  The  time  between  finding  an  Incom¬ 
plete  atom  and  attaching  an  element  to  its  queue 
must  he  protected  from  completion  of  the  atom. 

This  could  be  accomplished  with  a  semaphore  on 
each  incomplete  atom.  Since  these  queues  are 
not  as  active  as  the  READY  queue  and  the  time 
duration  between  finding  an  incomplete  atom  ami 
using  its  queue  is  short,  little  time  would  be 
lost  on  processor  synchronization. 

The  READY  Queue 

The  READY  queue  must  be  an  extremely  fast 
queue,  since  all  functions  must  pass  through  It, 

As  long  as  all  Instruction!  put  into  the  READY 
queue  are  eventually  given  to  processors,  it  Is 
not  important  to  force  specific  queue  behavior  on 
the  READY  queue.  AIsd,  it  is  not  necessary  to 
have  multiple  READY  queues  for  different  proces¬ 
sors  if  processors  pull  only  the  type  of  func¬ 
tions  they  need  from  a  single  READY  queue,  although 
this  could  Involve  unnecessary  waiting  for  the 
proper  function  type. 

A  PROGRAMMING  EXAMl’l.E 

A  characteristic  example  of  the  parallelism 
Introduced  by  the  FP  computer  is  found  In  a 
sorting  program.  A  merge-sort  program  written  In 
an  FI*  system  might  be: 

def  SORT  (/MERGE).  (o[  id)) 
def  MERGE  ;  null null***/ i 

GREATER0  [1 »]  ,1  °2  )-*apnd  1 .  [  1  .MERGE0 

n.ti.sih 

apndl  °  |  i°  /  .MERGE.  1 1 1°  / ,:;)  | 

Since  MERGE  is  associative,  /MERGE  can  be 
implemented  with  an  insert-associative  form.  One 
kind  of  parallelism  will  result  from  the  use  of 
the  insert-associative:  the  MERGE  function  will 
be  arranged  in  a  tree  and  all  merges  In  u  level  of 
the  tree  will  execute  in  parallel.  Another  kind 
of  parallelism  arises  when  the  MERGE  operations 
produce  partial  results  through  the  use  of  in¬ 
complete  sequences.  Each  time  a  MERCE  produces 
an  element  of  its  result,  this  element  Is  immedi¬ 
ately  fed  into  the  next  higher  MERGE,  A  dfsgram 
of  the  data  flow  Is  given  in  Figure  2. 

This  parallelism  was  achieved  ,omplctely  by 
the  computer;  ro  explicit  parallelism  was  embedded 
in  the  program.  This  example  should  serve  as  an 
indication  of  the  amount  of  parallelism  which 
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would  naturally  occur  when  •  program  is  run  on  an 
KV  computer. 


CONCLUSIONS 

Functional  programming  ayatama  provide  a 
hauls  for  a  computer  archltectura  which  Introduces 
parallelism  at  the  moat  basic  level:  the  machine 
language.  Through  the  use  of  lncomplata  objects, 
a  completely  data-driven  computer  has  baen  de¬ 
signed.  Parallelism  ha a  baen  achievad  without 
complex  synchronisation  machanlama  or  complex 
inter-processor  communication  networks.  Further¬ 
more,  the  computer  could  accommodate  very  large 
masbers  of  processors  for  tha  Introduction  of  a 
very  high  degree  of  parallelism. 

This  computer  has  tha  additional  benefit  of  a 
structured  machine  language  with  simple  and 
claan  semantics.  No  instructions  are  provided 
for  the  Introduction  of  parallelism;  this  comes 
automatically.  Thus,  all  programs  run  on  this 
computer  taka  advantage  of  available  parallelism 
without  tha  aid  of  spacial  parallel  languages  or 
compilers.  Parallelism  does  not  change  tha  seman¬ 
tics  of  a  program,  allowing  tha  programs  to  be 
analysed  without  regard  to  parallel  behavior. 
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On  Architectures  for  Document  Preparation 
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ABSTRACT  -  We  claim  that  the  principal 
limitation  In  the  performance  of.  current 
document  preparation  programs  lias  In  the 
Inability  of  the  underlying  architecture 
to  efficiently  execute  the  most  frequently 
performed  operation  —  the  movement  of 
data  and  Its  reorganization  In  the  compu¬ 
tation  of  line  images  of  the  output.  We 
present  the  design  of  a  unit  Intended  to 
expedite  this  data  rearrangement,  In  the 
context  of  a  macro-architecture,  and  show 
how  this  unit  can  be  generalized  to 
variety  of  other  processing  tasks. 


1.  INTRODUCTION 

Architectures  are  described  which  utilize 
VLSI  technology  to  directly  address  the 
problems  of  document  formating  to  which 
computers  are  being  applied  v.lth  increas¬ 
ing  frequency  In  the  rapidly  evolving 
field  of  office  automation. 

Both  a  macro-architecture  and  a  micro¬ 
architecture  are  described.  The  macro 
architecture  presents  a  framework  within 
which  to  develop  all  of  the  functions 
associated  with  document  processing.  The 
micro-architecture  Is  a  specification  of 
the  design  of  a  particular  aspect  of  docu¬ 
ment  processing. 

Our  particular  micro-architecture 
addresses  the  area  of  text  formatting. 
The  major  component  of  this  architecture 
Is  the  Fill  Line  Unit  (FLU)  which  performs 
a  function  In  DPM' s  analogous  to  that  per¬ 
formed  by  the  ALU's  In  conventional 
machines.  It  provides  a  first  example  of 
the  realization  In  hardware  of  the  many 
functions  associated  with  text  processing. 


2.  MOTIVATION 

Computer  toehnology  Is  generally  described 
as  having  progressed  through  several 
stages  of  evolution,  usually  referred  to 
as  generational 

e  First  generation  (1950-1957)  -  vacuum 
tubes  and  miscellaneous  main  memories 

e  Second  generation  (19r>8-1964) 
transistors  and  random  access  mag¬ 
netic  core  memories 

e  Third  generation  (1965-1975)  -  small 
scale  integrated  circuits  and  random 
access  magnetic  core  or  solid  state 
memories 

4  Fourth  generation  (1975-present) 
medium  scale  integration  and  solid 
state  random  access  memories 


During  the  same  period  of  time  there  has 
been  a  steady  shift  from  primarily  arith¬ 
metic  and  control  computation  to  the  mix¬ 
ture  of  arithmetic  and  symbolic  computa¬ 
tion  typified  by  document  preparation  and 
the  so-called  "office  automation." 


The  changes  in  technology  have  been 
reflected  in  the  architecture  of  the  pro¬ 
cessing  units.  The  Introduction  of  a  bus 
structure  was  eventuated  by  the  availabil¬ 
ity  of  large  numbers  of  registers  in  the 
processing  unit  with  the  transition  to 
third  generation  systems.  The  introduction 
of  cache  memories  came  with  the  availabil¬ 


ity  of  solid-state  memories.  However,  the 
architecture  of  computers  has  not  dramati¬ 
cally  been  af7ected  by  the  changes  in  the 
typical  application  mix. 


In  (7)  Mukhopadhyay  surveys  architectural 
considerations  for  non-numeric  processing 
and  points  out  that,  "With  the  prolifera¬ 
tion  of  computers  in  all  spheres  of  human 
civilization,  most  of  what  will  be 
expected  of  future  computers  will  be  non- 
numerical ...  Ex istlng  computer  architecture 
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does  not  provide  efficient  non-numeric 
computation." 

Much  of  the  research  in  non-numeric  pro¬ 
cessing  of  late  has  centered  on  searching, 
sorting  and  pattern  matching  hardware  for 
database  machlnes[8].  Architectures  for 
document  preparation,  and  in  particular 
text  formatting  systems,  need  not  use  spe¬ 
cial  hardware  for  searching,  sorting  or 
pattern  matching. 

We  believe  that  the  major  Improvement  in 
computer  architecture  required  by  document 
preparation  systems  is  the  rapid  and  effi¬ 
cient  rearrangement  of  data  in  memory, 
with  relatively  minimal  associated  pro¬ 
cessing.  The  proof  of  such  an  assertion  is 
likely  to  be  quite  difficult,  but  the  data 
in  Table  1  show  the  effect  of  'line  fil¬ 
ling'  only  on  an  admittedly  simple  docu¬ 
ment  processor - rof f  (111. 


I  of  processing  time  processing  time 
lines  w.  line  filling  w/o  line  filling 


(epu  seconds) 

(epu  seconds) 

.  ? 

101 

2.3 

1.5 

544 

9.4 

5.3 

876 

11.4 

6.5 

Tabl a  1.  Comparison  of  Processing  Time 
W  and  W/0  'Line  Filling' 


In  Table  1,  the  same  documents  were  run 
through  the  document  processor  twice,  once 
with  'line  filling*  in  which  case  lines 
are  right  and  left  justified,  and  once 
without  'line  filling'  in  which  case  the 
text  Is  printed  without  rearrangement.  In 
line  filling,  the  text  is  arranged  so 
that,  on  each  line,  the  maximum  number  of 
words  are  Included  and  if  these  do  not 
quite  fill  the  line,  then  the  words  are 
spaced  out  inserting  added  blanks  between 
words.  Although  this  incremental  process¬ 
ing  requires  relatively  little  computa¬ 
tion,  It  is  very  intensive  In  data  move¬ 
ment  . 

In  this  paper  we  describe  a  document 
preparation  component,  the  Fill  Line  Unit 
(FLU),  which  can  be  used  to  enhance  the 


capability  of  machines  used  heavily  for 
this  type  of  non-numeric  computation.  The 
augmentation  of  architectures  by  means  of 
such  add-on  units  has  many  precedents  in 
the  evolving  architecture  of  computers! 
extended  arithmetic  capability,  I/O  chan¬ 
nels,  cache,  memory  mapping,  and  direct 
memory  access  are  such  enhancements  which 
have  been  Introduced  as  the  technology 
became  appropriate. 


3 .  MACRO-ARCHITECTURE 

We  now  describe  a  macro-architecture  (see 
Figure  1)  as  a  framework  for  explicating 
the  concept  of  the  FLU.  In  this  architec¬ 
ture  user  text  is  kept  in  a  Line  Memory 
(LM) ,  a  buffer's  worth  of  lines  for  each 
active  user.  The  state  of  a  formatting 
process  is  kept  at  any  time  in  e  register 
bank  indexed  by  user.  Among  the  registers 
are  the  file  descriptor  register  (FDR), 
the  line  address  register  (LAR)  which 
points  to  the  next-  line  in  a  user's 
buffer,  a  memory  data  register  (MDR)  which 
contains  a  line  fetched  from  line  memory 
or  gotten  from  an  I/O  device,  and  the  line 
count  register  ( LCR )  which  contains  the 
number  of  lines  left  to  process  in  a 
user  1 s  buf f or . 

Typically,  a  user's  process  index  Is 
placed  in  the  Bank  Select  Register  causing 
the  user's  process  registers  to  be 
selected.  The  LAR  is  used  to  eddress  the 
next  line  in  the  Line  Memory  to  be  pro¬ 
cessed.  This  line  is  accessed  and  con¬ 
catenated  with  the  present  contents  of  the 
MDR,  the  FLU  unit  is  activated  and  the 
result  placed  back  in  the  MDR,  If  all  the 
lines  of  a  given  user's  buffer  area  have 
been  processed,  then  a  new  buffer's  worth 
is  brought  into  the  LM. 


4.  DESIGN  AND  IMPLEMENTATION  OF  THE 
MACRO- ARCHITECTURE 

So  far  we  have  specified  a  framework 
within  which  a  FLU  could  be  utilized.  Now 
we  specify  the  details  associated  with  a 
text  formatting  application. 

The  registers  in  the  register  bank  have 
only  thus  far  been  partially  specified. 
The  text  formatter  registers  in  the  regis¬ 
ter  bank  consist  of  the  left  margin  regis¬ 
ter  (LMR)  which  contains  the  position  of 
the  left  margin  on  a  line  of  output  text, 
the  right  margin  register  (RMR)  which  con¬ 
tains  the  position  of  the  right  margin  on 
a  line  of  output  text,  the  page  number 
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register  (PNR)  which  contains  the  number 
of  the  current  page,  the  line  space  regis¬ 
ter  (LSR)  which  contains  the  number  of 
spaces  between  output  lines,  the  page 
length  register  (PLR)  which  contains  the 
number  of  lines  in  an  output  page,  the 
header  register  (HR)  which  contains  the 
header  line  to  be  placed  on  each  output 
page,  the  footer  register  (PR)  which  con¬ 
tains  the  footer  line  to  be  placed  on  each 
output  page,  the  piece  register  (PR)  which 
contains  the  piece  of  the  MDR  that  is  left 
after  the  leftmost  portion  of  the  MDR  is 
output,  the  header  bit  (H)  which  Is  set  if 
a  header  is  to  be  output,  the  footer  bit 
(P)  which  is  set  if  a  footer  is  to  be  out¬ 
put,  and  the  fill  bit  (PL)  which  is  set  if 
the  line  filling  operation  associated  with 
the  MDR  is  to  be  activated. 

The  text  formatter  accepts  text  to  be  for¬ 
matted  along  with  commands  describing  the 
output  format  of  the  text.  Ideally,  we 
envision  a  command  language  that  resembles 
the  language  used  to  edit  manuscripts.  To 
be  brief,  we  will  confine  our  command 
language  to  be  rather  conventional  (see 
Table  2).  It  is  essentially  identical  to 
that  proposed  in  (3). 

Both  commands  and  data  are  resident  in 
line  memory.  Lines  are  organised  in  terms 
of  f lytes  (short  for  flagged  bytes) 
there  are  N  flytes  to  a  line.  The  format 
of  a  flyta  is 


I  type  I  value  I 


In  our  case  there  are  two  types  of 

flytes - data  and  commands.  For  the  data 

flyte  the  value  is  the  Internal  data  char¬ 
acter  representation.  For  command  flytes 
the  value  is  an  instruction  to  be  per¬ 
formed,  the  total  format  of  which  is 


I  type  I  fmt  I  op  I  opnd  1  I  ...  I  opn't  n  I 


Here  the  command  may  be  comprised  of 
several  flytes.  The  fmt  ,  format  field, 
describes  the  composition  of  the  rest  of 
the  Instruction.  For  instance,  it  might 
specify  the  number  of  operands.  The  og 
field  provides  the  command  which  is  to  Ee 
performed.  A  comprehensive  treatment  on 


the  selection  of  instruction  formats  can 
be  found  in  [4 , S] . 

Let  us  consider  an  Instantiation  of  this 
format  for  our  instruction  set. 

Here  we  consider  a  flyte  to  be  eight  bits 
long  and  character  data  to  be  in  ASCII 
representation.  Since  internal  ASCII 
Involves  only  7  bits,  we  can  have  a  type 
field  of  1  bit  and  still  fit  a  data  char¬ 
acter  in.  There  will  be  two  command 
formats---no  operand  and  one  operand,  dis¬ 
tinguished  by  the  setting  of  the  second 
bit.  The  no  operand  format  will  take  up 
the  remaining  6  bits  of  the  flyte,  the  one 
operand  format  will  also  latch  on  to  the 
next  8  bit  flyte  (which  must  have  left  bit 
set)  for  its  argument  (range  0-127). 

no  operand i 


I  1  I  0  I  command 


one  operand; 


i  1  I  1  I  command  I  111  command  I 


5.  IM  PILL  LIME  UNIT i  THE  MICRO¬ 
ARCH  itectOre~ 

The  unit  which  we  have  chosen  to  call  the 
Fill  Line  Unit,  (FLU),  plays  a  role  in 
document  preparation  analogous  to  that 
played  by  the  arithmetic  logic  unit, 
(ALU) ,  in  scientific  computation.  Like 
ALU's,  FLU's  have  a  decomposition  theory 
which  allows  descriptions  as  serial, 
series-parallel ,  or  parallel  realixatlons 
with  the  appropriate  equipment/speed 
tradeoffs  and  function  of  two  operands  and 
a  carry.  In  FLU's,  the  corresponding 
situation  is  Been  in  Figure  2,  which  is  a 
simplification  of  the  FLU. 

Here  I  is  a  register  which  stores  1 
flytes,  where  1  is  chosen  large  enough  to 
generate  a  complete  line  image  of  charac¬ 
ters.  The  data  In  I  at  cycle  n  are  used  to 
generate  the  output  line  L,  and  any  extra 
flytes  are  then  stored  in  0.  Thus  the 
functional  dependencies  are: 

Ln  •  g(In) 

On  ■  h(In) 

Further  the  new  value  of  I  is  determined 
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by  th«  values  of  0  and  MDR ; 

In  «  f  (MDRr, ,  On-1  ) 

(Tha  analogy  batwaan  an  ALU  and  a  FLU  can 
now  ba  aaan  more  claarly  slnca  0  La  Ilka  a 
carry  and  L  Is  Ilka  a  sum.)  Tha  two  units 
oT  combinational  logic  shown  In  Figure  2, 
SI  and  S2  ara  than  tha  primary  objects  of 
intarast  In  tha  synthesis  of  tha  FLU. 

Tha  above  discussion  has  ignored  signals 
which  originate  in  S2  and  sat  global  state 
information,  and  signals  feeding  tha  glo¬ 
bal  state  information  into  tha  FLU. 

Tha  complexity  of  tha  FLU  is  thus  seen,  in 
Figure  2,  to  depend  on  the  complexity  of 
the  units  f.l  and  S2,  the  remaining  units 
being  conventional  registers.  The  role  of 
the  SI  unit  is  to  shift  inputs  from  MDR  to 
tha  right  by  the  length  of  tha  data  in  tha 
0  unit,  with  tha  non-empty  data  in  0  being 
transferred  directly  into  the  leftmost 
stages  of  1.  SI  performs  a  uniform  shift 
of  all  the  elements  of  MDR ;  symbolically, 

MDRn-1 , k  — >  In  ,k+d 

whero  Rx,y  is  the  contents  of  stage  y  of  a 
given  register  R  at  time  x,  and  d  is  the 
amount  of  shift  required.  If  the  size  of 
the  MDR  1 3  s  flytes,  and  each  flyte  con¬ 
sists  of  k  bits,  then  the  complexity  of  Si 
will  be  proportional  to  k*s*log (dmax)  , 
whore  dmax  is  the  maximum  possible  shift 
required.  in  Figure  3,  we  show  a  realiza¬ 
tion  of  an  :n  unit  for  dmax  *  3,  k  «  I  , 
and  s  »  3. 

Ih  Figure  3,  the  binary  encoded  shift  con¬ 
trol  on  the  left  is  via  a  register  q  which 
requires  loq(dmax)  bits  of  storage  to  con¬ 
trol  the  shift  operation  of  SI,  and  for 
the  pth  shift  control  bit,  qp  ,  stage  i  is 
shifted  rijht  qp  *  pp  places;  i.e.  no 
right  shift  if  qp  *  0,  and  a  right  shift 
of  pp  if  qp  ««  1. 

The  S2  is  considerably  more  complicated 
since  it  performs  decoding  of  the  flytes 
to  interpret  the  embedded  control  informa¬ 
tion  and  non-uniform  shifts.  We  can  ima¬ 
gine  the  structure  of  the  S2,  as  a  uniform 
cascade  of  stages  as  shown  in  Figure  4. 
Figure  4  is  a  conceptual  decomposition  of 
the  S2  into  a  linear  cascaded  array  of 
identical  flyte  stages.  If  11  contains 
data,  the  control  unit  of  stage  1  will 
pass  the  control  signals  through  and  gen¬ 
erate  a  shift  of  tha  appropriate  amount. 
If  II  contains  a  control  flyte,  and  is 
therefore  not  to  be  shifted  to  0,  then  the 


control  information  passed  to  adjacent 
units  is  modified,  and  ll  would  be 
deleted.  It  might  then  be  necessary  for 
units  to  the  right  of  11  to  cause  a  left 
shift  of  their  contents  to  L. 

We  do  not  give  a  complete  description  of 
the  S2  but  describe  only  tha  logic  needed 
to  generate  a  filled  line,  omittl".  St 
logic  needed  for  the  other  commands  and 
functions.  Further,  we  shall  describe  the 
processing  as  done  in  a  single  clock 
cycle.  Assuming  a  maximum  line  length 
between  128  and  255,  tha  following  bus 
lines  are  required  (of  course,  using  more 
clock  cycles  allows  fewer  bus  lines  since 
lines  can  be  shared  among  functions)) 


Funct ion 

1  of  bits 

Notation 

right  margin 

8 

RM ( 1 -8 ) 

right  end 

8 

RE [1-8) 

of  text 

word  count 

6 

CT ( 1 -6 ) 

fill  status 

1 

F 

fill  shift 

5 

FSll-5) 

fill  parameter 

2 

FP  1 1-2 1 

r  lghtmost 

1 

S 

space  seek 

actual  shift 

5 

SHI  1-5) 

Starting  at  the 

left,  the 

word  count  is 

set  to  zero  and 

passed  to 

tha  right,  balng 

incremented  at  each  space  following  a 
non-space.  The  right  margin  position,  r, 
is  encoded  on  RM.  At  position  r  this 
information  is  decoded  and  passed  to  tha 
left  on  S  until  the  first  space  Immedi¬ 
ately  to  the  right  of  a  non-apaca,  at 
position  s,  and  position  s  Is  than  encoded 
on  RE.  The  difference  between  RM  and  RE 
is  then  placed  on  FS  and  the  value  of  FS 
divided  by  CT  is  placed  on  FP. 

FS i  is  the  Incremental  amount  of  shift 
required  at  stages  following  1  to  right 
justify  the  line,  and  SHi  is  the  actual 
shift  of  stage  i.  SHi  and  FS1  are  com¬ 
puted  from  SHi-1  and  FSi-l  with  SHI  ■  0. 
If  Ii  Is  not  blank  SHi  -  SHi-1  and  FSi  ■ 
FSi-l.  If  Ii  is  blank  and  I i— 1  la  not 
blank  then  if  FSi-l  >■  FP  then  FSl  »  FSl-1 
-  FP.  (Note  that  FSl  +  SHt  -  FSl;  the 
actual  nhift.  at  stage  i  and  the  added 
iihift.  required  is  a  constant.) 

Current  component  densities  are  adequate 
to  contain  a  fully  parallel  FLU  on  a  sin¬ 
gle  chip  for  a  maximum  line  size  of  130 
chsracters[6) . 
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6.  EXTENSIONS  TO  OTHER  TP  FUNCTIONS: 
ENHANCING  THE  MICRO -ARCHITECTURE 


6.2  HYPHENATION 


The  FLU  described  in  the  previous  section 
has  shown  how  to  implement  many  of  the 
classical  functions  as  described  in  [3]. 
Other  text  processors  may  choose  to  add 
functions  to  thetie  to  produce  a  Cadillac 
version  text  processor.  While,  for  reasons 
of  style,  we  prefer  the  simpler  text  pro¬ 
cessors  -  especially  In  an  expository 
treatment  -  It  Is  worth  considering 
briefly  how  the  architecture  described  is 
adaptable  to  some  of  these  deluxe 
features.  The  two  which  we  shall  describe 
are  text  macros  and  hyphenation. 

6.1  TEXT  MACROS 

A  text  macro  is  a  sequence  of  flytes  which 
replace  a  single  flyte  in  the  source  text 
prior  to  execution.  We  shall  assume,  for 
simplicity,  that  the  replacement  text  is 
fully  expanded  although,  in  principle,  it 
need  not  be.  Let  m  be  the  macro  variable 
flyte  Invoking  the  macro  and  assume  that  m 
occurs  in  a  source  line  x  m  y.  Assume 
further  that  M  is  the  expansion  of  m.  Then 
after  macro  substitution  the  source  text 
is  x  M  y,  where  M  is  a  sequence  of  flytes. 
The  transformation  from  x  m  y  to  x  M  y 
does  not  affect  x  and  involves  shifting  y 
to  the  right  by  length(M)  -  length(m) . 
Then  the  substitution  text,  M,  must  be 
placed  in  the  resultant  gap.  Now  the  FLU 
architecture  is  designed  to  facilitate 
exactly  this  kind  of  data  movement. 

within  the  context  of  the  rLU,  the  macro 
definitions  could  be  stored  in  an  associa¬ 
tive  ROM.  Upon  invocation  of  the  macro  the 
replacement  text  would  be  retrieved  from 
the  ROM  and  shifted  to  the  appropriate 
position  (using  the  SI  unit)  . 

Conditional  expansion  of  macros  based  upon 
macro  variable  flytes  and  external  vari¬ 
ables  (e.g,  register  contents,  transforma¬ 
tions  on  register  contents)  is  also  possi¬ 
ble.  A  condition  PLA  having  inputs  of 
macro  variable  flytes  and  external  vari¬ 
ables  can  generate  an  output  c  depending 
upon  which  conditions  are  met.  The  associ¬ 
ative  ROM  holding  the  macro  definitions 
would  be  accessed  by  the  key  (m,c)  where  m 
is  the  macro  variable  flyte.  The  input 
(m,c)  would  act  as  a  composite  key  for  the 
macro  definition. 


Most  hyphenation  schemes  depend  on  some 
simplified  algorithm  to  approximate 
correct  hyphenation.  We  shall  assume  that 
we  have  available  a  small  hyphenation  box, 
H,  whose  function  is  as  follows:  Given  a 
sequence  of  n  letters  representing  the 
tail  of  a  word  (possibly  the  whole  word), 
and  a  parameter  q,  H  will  determine  the 
place  closest  to  and  less  than  q  where  a 
hyphen  can  be  placed.  While  we  have  not 
studied  hyphenation  algorithms  in  detail, 
we  do  not  think  that  the  design  of  such  a 
unit  is  extremely  difficult. 

Now  the  FLU  will  gate  the'  word  to  be 
hyphenated  to  H  with  parameter  q  indicat¬ 
ing  where  the  hyphenation  is  needed  and 
will  use  the  returned  signals  to  control 
shifting  end  line  filling. 


7.  EXTENSION  TO  THE  HOST  ARCHITECTURE: 

ENHANCING  ftfE  MACKS^ARChITECTURB 

In  Section  3  we  provided  a  strictly 
vanilla  architecture  as  a  vehicle  for 
presenting  the  FLU.  We  believe  that  such 
an  architecture  can  be  generalized  to  one 
of  a  document  preparation  machine.  Per¬ 
tinent  ideas  to  this  end  will  now  be 
presented,  but  in  the  context  of  a  text 
processing  environment. 

The  architecture  of  a  computer  system 
should  be  responsive  to  the  needs  of  the 
user.  In  a  text-formatting  environment, 
there  is  a  need  for  entering  information 
from  interactive  terminals  and  outputlng 
formatted  information  from  printers  or 
terminals. 

Users  input  requests  and  the  system 
translates  them  'Into  actions  that  it  can 
execute.  These  actions  can  be  realized  by 
functional  units,  micro-coded  subroutines, 
etc.  For  Instance,  the  request 
format(flle  descriptor)  might  be 

translated  By  the  system  into  actions 
which  include:  transform ( 1 lne)  , 

get (buffer)  ,  output^l lne) ,  Here 

transform (l ine)  would  get  the  next  line 
from  a  main  memory  buffer  and  format  it 
for  printing,  output ( line)  would  give  the 
line  to  a  suitable  output  device,  and 

Set  (buffer)  would  replenish  the  line 
uf far. 

For  a  given  request,  its  associated 
actions  are  related  by  rules  for  their 
application.  These  rules  can  be 
represented  by  a  state  diagram  where  the 
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states  represent  the  actions  and  the  tran¬ 
sitions  represent  their  outcomes  (see  Fig¬ 
ure  5) . 

In  the  high-level  architecture  for  the 
text-formatting  machine  there  exists  a 
supervisory  unit  which  contains  the  state 
diagrams  for  all  requests  and  that 
sequences  through  these  aotlons  as  the 
requests  progress. 

Figure  6(a)  gives  a  conceptual  view  of  the 
Supervisor,  Figure  6(b)  gives  a  suitable 
refinement,  and  Figure  6(c)  gives  the  exe¬ 
cution  cycle  for  the  refinement.  Note 
that  terminals  put  requests  on  a  queue 
which  Is  eventually  processed  by  the 
Supervisor.  The  outcomes  of  each  action 
execution  are  feedback  to  the  Supervisor 
for  further  processing  according  to  the 
state  diagram  associated  with  the  execut¬ 
ing  request. 

This  supervisory  model  forms  the  basis  for 
a  multiuser  Interactive  system.  (More 
about  this  approach  can  be  found  in 
(1,2)).  Here  the  actions  associated  with 
the  executing  request  of  one  user  can  be 
overlapped  with  the  actions  associated 
with  the  executing  requests  of  the  other 
users.  Thus  we  have  a  pipeline  organiza¬ 
tion  where  we  are  always  executing  dif¬ 
ferent  parts  of  separate  requests  Tn 
parallel . 

expanding  on  this  structure  yields  a 
machine  architecture  as  pictured  In  Figure 
7.  Users  enter  requests  to  create,  edit 
and  process  text  to  be  formatted.  Output 
can  appear  on  either  the  Initiating  termi¬ 
nal  or  on  a  line  printer. 

The  Supervisor  controls  the  sequence  of 
action  executions  while  the  functional 
units  realize  the  actions  In  terms  of 
micro-orders,  register  transfers,  etc.  As 
an  example,  let  us  specify  In  micro-orders 
the  semantics  of  the  transform  action! 

transform ( 1 tne)  ■ 

Bank  select  <-  get  queue()j 

if  (  (LCR)  «  0) 

outcome (EXHAUSTED) i 

el se  ( 

MDR  <-  MDR  o  LM [LAR ] ; 

MDR  <-  TRAN (MDR)j 
LAR  <-  LAR  ♦  1) 

LCR  <-  LCR  -  1) 
outcometOKAy ) i 

) 

Here  outcome ( code)  places  a  return  code  on 
the  Request  Queue,  get_queue()  gets  the 


next  entry  from  the  transform  action 
queue.  Bank  select  Is  a  register  which 
indexes  the  appropriate  user's  register 
aet,  the  operation  'o'  concatenates  the 
contents  of  the  next  line  In  line  memory 
to  the  MDR,  and  TRAN (MDR)  provides  the 
combinational  logic  function  to  do  the 
line  filling  and  manipulating  operations. 


8.  GENERALIZATIONS!  OTHER  APPLICATIONS  OF 
flit  XRfriffEffljRE - 

There  are  several  essential  features  In 

the  design  of  the  FLU  which  suggest  gen¬ 
eralisations  to  functions  other  than  docu¬ 
ment  preparation.  First,  the  FLU  operates 
on  a  unit  of  data  which  Is  much  larger 
than  the  elemental  storage  component  typi¬ 
cally  processed  at  the  Instruction  level 
of  the  computer.  This  can  be  considered 
the  outer  loop  of  the  FLU  control.  Second, 
within  the  unit  of  data  being  processed  by 
the  FLU  there  Is  a  functional  pattern  sug¬ 
gesting  Iterative  decompositions  —  which 
can  be  parallel  or  series-parallel  — 
which  are  amenable  to  replication  at  the 
component  level.  Third,  within  the  data 
unit  processed  by  the  FLU  there  Is  a  com¬ 
bination  of  data  and  control  elements 
similar  to  a  tagged  architecture. 

The  general  action  of  the  FLU  may  thus  be 
understood  at  the  outer  level  of  control 

asi 


While  (FOREVER)  ( 

If  (DATA  UNIT  NOT  COMPLETE) 
FETCH  MORE  INPUT, 

else 


) 


PROCESS  THE  DATA  UNIT, 


which  In  the  specific  document  preparation 
component  case  becomes! 

While  (FOREVER)  [ 

If  (OUTPUT  LINE  NOT  COMPLETE) 

FETCH  ANOTHER  INPUT  LINE, 

else 

GENERATE  AN  OUTPUT  LINE) 

) 


In  either  case  the  Input  is  a  sequence  of 
flytes  in  which  the  data  and  control  are 
Intermixed,  and  the  output  Is  a  sequence 
of  data  flytes.  The  relationship  between 
the  size  of  the  input  quantum,  the  size  of 
the  output  quantum,  and  the  Intermediate 
storage  within  the  FLU  must  be  studied  to 
obtain  optimal  performance. 


The  same  processing  loop  is  applicable  to 
a  variety  of  programs  In  UNIX  which  have 
eaaentlally  thia  overall  control  structure 
-  such  as  awk  [10],  aed  [9],  and  grap  [91. 
(Awk  and  aed  analyse  text  line  by  line, 
while  grap  searches  lines  to  detect  a  pat¬ 
tern.)  TKe  FLU  can  be  adapted  to  a  variety 
of  programs  by  having  the  cascade  control 
logic  of  the  S2  unit  under  microprogram 
control . 


9.  CONCLUSION 

Our  architectures  provide  for  a  synthesis 
of  very  large  scale  integrated  circuit 
technologies  and  program  structure  con¬ 
cepts  to  respond  to  the  needs  of  office 
automation. 

The  macro-architecture  and  the  micro- 
architecture  which  we  have  described  com¬ 
bine  to  provide  s  state-of-the-art  unit 
suited  to  an  increasing  number  of  applica¬ 
tions. 
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ABSTRACT 

Thin  paper  discusses  the  design  of  a 
■  pedal  purpose  computer  to  be  used  In  the 
scanning  of  text.  The  design  of  this 
machine  allows  it  to  operate  at  a  reason¬ 
ably  high  level  when  performing  text 
searches.  This  capability  not  only  sim¬ 
plifies  the  requirements  of  the  transla¬ 
tion  process  used  to  derive  machine  code 
from  user  enquiries  but  also  enhances  the 
speed  of  the  device  which  Is  an  essential 
feature  If  data  Is  to  be  scanned  while 
being  taken  from  a  rotating  storage  med¬ 
ium.  Of  special  Interest  la  the  design  of 
the  term-detection  unit  which  Incorporates 
features  which  should  be  of  use  In  a 
direct-execution  arhc 1 1 e c t ur e  ,  specifi¬ 
cally  those  modules  which  are  responsible 
for  the  recognition  of  keywords  and  tokens 
In  a  stream  of  source  text. 


INTRODUCTION 

In  the  past  few  years  ue  have  seen  a 
growing  Involvement  with  systems  which 
have  as  their  main  function  the  scanning 
of  extremely  large  data  bases  of  textual 
Information  containing  perhaps  billions  of 
characters.  Examples  of  such  applications 
Include  text  retrieval  systems  for  Intel¬ 
ligence  reports,  treatises  and  corpora  In 
law  libraries,  medical  bibliographic  ser¬ 
vices,  and  large  repositories  of  newspaper 
articles. 

This  literature  searching  Is  mainly 
characterized  by  the  fact  that  the  textual 
Information  is  not  structured.  Due  to  the 
way  the  Information  is  collected  and 
because  of  the  nature  of  the  Information 
It  Is  usually  difficult  to  provide  ade¬ 
quate  cost-effective  Indexing  systems. 
Consequently,  If  there  is  any  subdivision 
of  the  information  content,  It  will  he 
such  that  the  Information  Is  grouped  into 
categories  which  are  very  extensive  In 
scope.  In  such  a  situation,  the  litera¬ 
ture  search  Is  accomplished  by  scanning 
the  entire  text.  Information  la  extracted 


when  If  satisfies  the  requirements  nl  a 
user  query  which  should  specify  u  s  u  I  I  1  - 
elcnt  number  of  constraints  on  the  search 
to  produce  the  required  documents  and  lit¬ 
tle  else. 

The  internal  formatting  of  the  text 
may  he  rather  Inconvenient  and  limited  to 
standard  punctuation  although  special 
•haracters  may  be  used  to  delimit  and 
hence  define  various  text  groupings  such 
as  sentences,  paragraphs,  sections,  docu¬ 
ments  etc. 

Various  papers  l  1  , 2 , 1 , 4  ,  A , 6 1  have 
dlscusBed  a  variety  of  architectures  for 
text  retrieval  and  In  (71,  Hollaar  dla- 
cusses  the  problems  associated  with  such 
endeavours  and  presents  a  survey  of  some 
of  the  architectures  which  are  of  current 
Interest.  In  [fi]  Chu  suggests  that, 
research  should  explore  the  hardware, 
software  tt«de-i)ff«  for  particular  appli¬ 
cations  Involving  high-level  constructs. 
This  paper  Is  essentially  an  attempt  to 
bring  some  of  the  high  efficiency  and  high 
performance  aspect*  of  d 1 r e c t - exe c u 1 1  on 
architecture  to  the  special  purpose  appli¬ 
cation  of  text  scanning. 


SYSTEM  FUNCTIONS 

In  text  retrieval  systems,  a  three 
step  process  Is  Involved  in  the  rap tuie  of 
textual  information: 

1)  query  translation 

2)  term  detection 

3)  query  resolution 

The  user  terminal  (see  fig,  1)  passes 
to  the  system  an  Information  request  which 
Is  expressed  as  a  query.  Examples  of  such 
an  Inquisition  are  as  follows: 

A  Keyword  Search 

Retrieve  any  document  that  contains 
the  character  string  A. 

(  A , 1  C,D)#n  Threshold  'OR' 

Retrieve  any  document  that  contains 
at  least  n  of  the  different  character 
strings  A,B,C,D.  Note  that  If  n«l 
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this  is  an  "OR"  operation;  hence  the 
retrieved  docuaent  contains  one  or 
aore  of  the  strings  A,B,C,  or  D.  if  n 
equals  the  nuaber  of  entries  in  the 
list,  then  this  Is  an  "AND"  opera¬ 
tion;  the  retrieved  docuaent  oust 
contain  all  the  Indicated  strings  In 
any  order. 

A  AND  NOT  It  Logical  Repressions 

Retrieves  any  docuaent  that  contains 
the  character  string  A  but  not  the 
character  string  B. 

<A,B>ln  Directed  Proximity 

Retrieve  any  docuaent  that  contains 
the  character  string  A  followed  by 
the  character  string  I!  within  n  char¬ 
acters  . 

(A,B)#n  Undirected  Proxlalty 

Retrieves  any  docuaent  that  contains 
character  strings  A  and  B  within  n 
characters  of  each  other. 

A 1 ! I B  "Don't  Care"  Characters 

Retrieves  the  docuaent  with  the  char¬ 
acter  string  A  followed  by  three 
arbitrary  characters  followed  by  the 
character  string  B. 

In  the  next  step,  the  query  transla¬ 
tor  will  create  the  necessary  aachine  code 
and  will  tend  It  (along  with  the  required 
data  Iteas)  to  the  query  resolution  aoduie 
which  guides  the  behavior  of  the  control 
unit  in  the  tens  detector  and  gathers  res¬ 
ponses  froa  the  tera  detector  In  order  to 
resolve  queries. 

Since  It  la  necessary  to  scan  a  vast 
aaount  of  text,  a  high  speed  of  execution 
In  the  tera  detector  and  query  reaolutlon 
aodulea  Is  of  utaoat  laportance.  In  this 
design,  the  scan  operations  are  designed 
to  function  at  a  reasonably  high  level. 
During  aost  of  the  tlae  a  search  operation 
will  be  carried  out  as  the  execution  of 
one  Instruction  In  the  search  control 
unit.  If  the  Input  text  currently  being 
examined  contains  characters  that  produce 
a  successful  match  with  a  given  tera,  then 
the  execution  of  various  Instructions  nay 
be  effected  In  order  to  accoapllsh  some 
aspect  of  the  query  resolution,  but  In 
aost  clrcuastancaa  the  microcode  executed 
during  a  scan  Instruction  will  rapidly 
skip  over  text  characters  which  do  not 
match  with  any  of  the  given  terms.  As  we 
shall  see,  It  Is  possible  to  design  hard¬ 
ware  facilities  which  will  accomplish  some 
of  turn  query  resolution  without  resorting 
to  the  execution  of  code  in  the  Query 
Resolution  processor  (QRP) . 

The  modular  structure  of  the  nucleus 
of  thrc  text  scanning  system  is  presented 
In  fig.  2.  Because  of  Its  functional 
capabilities,  it  includes  the  tera  detec¬ 
tion  unit  of  fig.  1  and,  in  addition  to 
this,  It  also  Involves  some  aspects  of  the 
query  resolution  block. 


The  tera  detection  aoduie  receives 
text  from  a  suitable  source  and  attempts 
to  match  character  substrlnga  in  this  text 
with  the  character  string  teraa  stored  In 
the  strinR  memory  contained  within  the 
module , 

When  a  successful  match  la  detected, 
the  match  line  Is  given  an  active  signal 
and  the  memory  address  of  the  matching 
string  la  passed  down  to  the  status  FIFO 
so  that,  if  necessary,  the  match  can  be 
"logged"  for  future  use  by  the  QRP.  The 
address  is  also  passed  to  the  Interrupt 
(Generation  Unit  which  can  be  used  to 
implement  the  "thraehold-or"  function  men¬ 
tioned  earlier.  The  IGU  also  decides 
whether  the  addreos  Is  to  be  logged  In  the 
status  FIFO. 

The  delimiter  detection  unit  Issues 
Interrupts  whenever  a  delimiter  passes  in 
the  text  streaa.  It  is  mainly  used  to 
detect  the  beginning  of  successive  docu¬ 
ments  In  the  source  text  since  many  of  the 
queries  will  be  related  to  the  contents  of 
a  document. 

Thus,  an  Interrupt  can  be  Initiated 
for  any  one  of  the  following  eventai 

a)  Detection  of  a  delimiter 

b)  Detection  of  a  tera 

c)  Completion  of  a  threahold-or 
during  passage  of  a  document. 

In  all  cases,  an  Interrupt  line 
causes  the  QRP  to  acknowledge  an  event 
which  Is  Important  to  the  resolution  of  a 
query.  If  It  cannot  Immediately  deal  with 
such  an  event,  all  pertlnunt  information 
Is  temporarily  logged  aa  status  In  the 
FIFO  buffer  until  the  QRP  can  find  the 
tlae  to  accept  It. 


TERM  DETECTION 

The  Input  to  the  term  detector  la 
taken  from  a  source,  for  example,  a  disk 
drive,  which  can  Issue  a  serial  stream  of 
characters.  It  Is  anticipated  that  the 
amount  of  processing  time  required  between 
character  shifts  will  be  less  than  400 
nanoseconds.  Since  typical  transfer  rates 
for  a  disk  are  about  one  byte  par  microse¬ 
cond  this  system  should  be  able  to  accept 
data  directly  from  a  disk  without  the  need 
for  buffer  meaorles  or  FIFO's. 

The  heart  of  the  term  detector  con¬ 
sists  of  a  lengthy  shift  register  which 
shifts  in  source  text  one  byte  (a  single 
character)  each  time  a  shift  operation  la 
Issued  by  search  control.  The  shift 
register  Is  capable  of  holding  32  charac¬ 
ters  which  are  available  froa  the  "paral¬ 
lel-out”  lines  of  the  shift  register. 
These  32  characters  can  be  compared  with 
any  one  of  236  strings  (or  terms)  In  a 
"string  memory"  which  has  a  data  but  capa¬ 
ble  of  dealing  with  32  characters  In  par¬ 
allel.  Comparisons  are  accoapllshed  by  a 
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linear  Array  of  comparitors  placed  between 
the  string  memory  and  the  shift  register. 
It  is  anticipated  that  each  character 
position  in  the  string  memory  will  Involve 
a  7  bit  ASCII  code  and  an  additional  bit 
used  to  signify  a  "don't  care"  or  uncondi¬ 
tional  aatch  character. 

The  string  aeaory  is  a  standard 
static  RAM  since  the  use  of  associative 
memory  for  this  function  would  be  very 
costly  at  the  present  t lae  .  However,  it 
is  obvious  that  some  type  of  parallel 
search  must  be  made  and  consequently,  the 
parallel  outputs  from  the  middle  four 
character  positions  of  the  shift  register 
lead  to  an  associative  memory  which  also 
has  a  word  depth  of  25*1.  We  will  refer  to 
these  four  characters  as  the  "partial 
match"  characters.  Prior  to  the  scan 
operation,  the  ayatem  will  enaure  that  the 
term  in  word  n  of  the  string  memory  cor¬ 
responds  to  four  partial  match  characters 
of  word  n  In  the  associative  memory  (CAM) 
(see  fig.  3).  As  text  streams  througn  the 
shift  register,  s  comparison  can  be 
effected  between  the  partial  match  outputs 
and  all  the  words  In  the  associative 
memory.  If  a  match  is  detected,  the 
address  of  tha  matching  word  it  derived 
from  an  encoder  which  is  driven  by  the 
match  outputs  of  the  associative  memory. 
This  address  is  fed  to  the  string  memory 
so  that  another  comparison  can  be  accom¬ 
plished,  this  time  Involving  the  full 
string.  This  final  full  comparison  will 
determine  whether  the  contents  of  the 
shift  register  contain  one  of  the  terms 
required  by  the  user  query.  With  a  suit¬ 
ably  fast  RAM  for  the  string  memory,  both 
comparisons  can  be  easily  accomplished  in 
the  t loe  Interval  be  twean  auccesslve 
shifts  as  characters  stream  off  disk. 

Our  only  constraint  is  that  all  words 
in  the  associative  memory  be  unique. 
Since  most  term*  in  the  string  memory  are 
not  going  to  be  r  full  32  characters  In 
length,  we  should  be  free  to  locate  a  term 
within  lta  word  ao  that  It  tnuaei  a  posi¬ 
tion  such  that  ths  four  characters  in  the 
partial  match  poaltlona  are  different  from 
all  the  rast. 

For  example,  suppose  we  ere  searching 
the  data  baaa  for  the  following  ten  terms: 
"  GUTS  AND  DOLLS  " 

"  THE  NIGHT  OF  THE  IGUANA  " 

"  A  STREETCAR  NAMED  DESIRE  " 

"  WHAT  MAKES  SAMMY  RUN?  " 

"  THE  DIART  OF  ANNE  FRANK  " 

"  A  LITTLE  NIGHT  MUSIC  " 

"  SWEET  CHARI TT  " 

"  THE  UNSINKARLE  MOLLT  BROWN  " 

"  A  CHORUS  LINE  " 

"  DON'T  BOTHER  ME,  I  CAN'T  COPF.  " 


Successive  words  In  the  string  memory 
might  be  set  up  as: 

I!  I  I  !  !  1  !  II  !  II  CUTS  AND  DOLl.S  I  I  I 
!  !  !  I  I  !  !  THF.  NIGHT  OF  THF.  IGUANA 
III!  II  A  STREETCAR  NAMED  DESIRF. 
!!!!!!!!!  WHAT  MAKES  SAMMY  RUN? 

I!  Ill  I!  TIIF.  DIART  OF  ANNE  I  RANK 
I  I  III  III  I!  A  LITTLE  NIGHT  MUSIC 
!  !  !  !  !  !  !  !  !  I  I  I  I  SVEF.T  CHARITY  Mil 
It!!  THE  UNSINKABLE  MOLLY  RROUN 
I  I  I  I  I  I  I  I  I  I  I  I  I  A  CHORUS  LINK.  Ml! 

!  DON'T  BOTHER  ME,  1  CAN'T  COPE 

while  the  successive  words  in  :  he 
associative  memory  would  he: 


The  I  in  tha  abova  list  represents  a 
don't  care  or  unconditional  match  charac¬ 
ter  . 

As  can  be  seen  in  the  above  example 
each  entry  in  the  partial  match  column* 
within  the  associative  memory  is  selected 
from  the  corretpondlng  character  positions 
of  terms  in  the  string  memory.  After  a 
parallel  comparison  of  source  with  all 
words  in  the  associative  memory  a  success¬ 
ful  match  will  simply  Indicate  a  matching 
substring  and  the  address  of  the  parent 
term  containing  that  substring.  One  mors 
comparison  with  the  parent  term  in  ths 
string  RAM  will  serve  to  verify  whether 
the  complete  term  le  in  the  eource  text. 

It  should  b«  noted  that  In  the  inter¬ 
est  of  clarity  we  have  omitted  from  fig.  3 
the  additional  circuitry  required  to  per¬ 
form  a  write  operation  Into  the  associa¬ 
tive  memory.  Prior  to  the  search,  the 
control  unit  will  define  both  string 
memory  and  associative  memory  by  shifting 
each  query  term  Into  an  appropriate  posi¬ 
tion  within  the  shift  regl  -ter  whereupon  a 
write  operation  may  be  executed. 


THF.  INTERRUPT  GENERATION  UNIT 

When  the  term  detector  place*  an 
active  signal  on  the  match  line,  It  Is  an 
indication  to  the  rest  of  the  system  that 
ths  value  currently  on  the  address  but  Is 
the  address  of  e  location  In  string  memory 
containing  a  required  term.  At  this  time, 
such  an  address  Is  accepted  by  the  Inter¬ 
rupt  Generation  Unit  (1GU)  and  used  to  aid 
the  proceeslng  of  a  query  resolution. 
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The  activity  to  b«  initiated  by  this 
detection  it  defined  by  the  content*  of  a 
RAM  which  la  ettabllahed  prior  to  the 
search.  The  address  value  from  the  tern 
detector  Is  used  to  access  a  6  bit  word 
which  Is  used  to  control  the  following  two 
activities : 

a)  Interrupt  enable 

If  this  bit  Is  eat,  an  Interrupt  sig¬ 
nal  Is  Issued  to  the  query  resolution  pro¬ 
cessor  (QRP).  The  QRP  can  then  act  on  the 
presence  of  the  Indicated  tern  by  execut¬ 
ing  code  associated  with  the  resolution  of 
some  particular  query. 

b)  Hardware  execution  of  the 
"threehold-or" 

Another  bit  In  the  I  Oil  RAM,  the 
"throshold-or  enable",  Is  used  to  deter¬ 
mine  whether  or  not  the  detection  of  this 
term  la  to  be  accompanied  by  the  decre¬ 
menting  of  a  counter  which  Is  responsible 
for  the  maintenance  of  the  term  count 
associated  with  a  particular  threshold-o r . 
The  remaining  four  bits  of  the  word  select 
(via  a  decoder)  one  of  sixteen  counters. 
Each  counter  Is  programmable  and  can  be 
loaded  from  the  data  bus  coming  from  the 
QRP.  Each  counter  la  four  bits  long,  and 
hence  the  maximum  threshold  allowed  In 
such  a  query  Is  16. 

Since  a  particular  term  in  any  docu¬ 
ment  must  decrement  the  selected  counter 
once  and  only  once,  a  separate  RAM  main¬ 
tains  a  "hit-list".  At  the  start  of  a 
document,  all  entries  In  this  RAM  ere  set 
to  zero.  When  a  term  Is  first  detected 
(match  line  high)  the  presence  of  a  cero 
fTom  the  hit-list  and  a  one  fron  the 

threshold-or  enable  bit  will  cause  the 
selected  counter  to  decrement.  This  cycle 
Is  immediately  followed  by  a  cycle  which 
writes  a  1  bit  Into  the  hit-list  and  hence 
any  future  detection  of  the  term  within 
the  same  document  will,  not  produce  an 

active  level  on  the  decode  enable  line. 

In  actual  practice,  It  may  be  neces¬ 
sary  to  duplicate  the  hit-list  facility 

since  It  must  be  cleared  between  docu¬ 
ments.  Consequently,  It  may  be  necessary 
to  clear  one  list  while  the  other  is  being 
used  . 

Finally,  It  should  be  noted  that  a 
pipeline  effect  enn  be  incorporated  Into 
the  design,  Once  the  match  address  Is 

avsilable,  It  can  be  latched  for  use  by 
the  ICO  and  in  this  way  the  activity  of 
the  1CU  and  the  processing  of  the  next 
character  In  the  term  detection  unit  may 
be  overlapped. 


CONCLUSION 

We  have  presented  a  design  for  a  text 
scanner  which  uses  a  tern  detection  unit 
incorporating  random  access  memory  and 
associative  memory  In  a  cost  effective 
manner.  An  additional  module,  referred  to 
as  the  Interrupt  generation  unit,  contri¬ 
butes  Information  which  greatly  enhances 
the  system  implementation  of  high  level 
queries  such  as  the  threehold-or. 


REFERENCES 

1.  Holiaar,  L.  A.,  "Rotating  Memory  Pro¬ 
cessors  for  the  Matching  of  Complex 
Textual  Patterns,"  The  Fifth  Annual 
Symposium  on  Computer  Architecture, 
April  1978. 

2.  Mukhopadhya y ,  A.,  "Hardware  Algorithms 
for  Non-numeric  Compute t Ion ,"  The 
Fifth  Annual  Symposium  on  Computer 
Architecture,  April  1978. 

3.  Roberts,  D.  C.,  (ed)  "A  Computer  Sys¬ 
tem  for  Text  Retrieval:  Design  Con¬ 

cept  Development,"  Report  RD-77-10011, 
Office  of  Research  and  Development, 
Central  Intelligence  Agency,  Washing¬ 
ton,  D.  C.,  1977. 

4.  Roberts,  D.  C.,  "A  Specialized  Compu¬ 
ter  Architecture  for  Text  Retrieval," 
Proc,  Fourth  Non-Numeric  Workshop, 
Syracuse,  N.  Y.  ,  Aug.  1978,  pp.  51-59. 

5.  Stellhorn,  W.  H.,  "A  Processor  for 
Direct  Scanning  of  Text,"  presented  at 
the  First  Non-Numeric  Workshop,  Dal¬ 
las,  Oct.  1974. 

6.  Foster,  M.  J.  and  Rung,  H.  T.,  "Design 
of  Special-Purpose  VLSI  Chlpsi  Example 
and  Opinions,"  Technical  Report  CMU- 
CS-79-147,  Department  of  Computer  Sci¬ 
ence,  Carnegle-Mellon  University. 

7.  Holiaar,  L.  A.,  "Text  Retrieval  Compu¬ 
ters,"  Computer,  Vol.  12,  No.  3,  1979 
pp.  40-50. 

8.  Chu,  Y.,  "Direct-Execution  Computer 
Architecture,"  information  Processing 
77,  IFIP ,  Nort h-Ho 1 land  Publishing  Co. 
(1977)  pp.  7-12. 


209 


lA.l,: 


_<ii  &wzdW7V 


A  COBOL  MACHINE  DESIGN  AND  EVALUATION 


Masahiro  YAMAMOTO,  Ryosel  NAKAZAKI 
Minoru  YOKOTA,  Mamoru  UMT-.URA 


Nippon  Electric  Co.,  Ltd.,  Centra]  Research  Laboratories 
4-1-1  Miyazaki,  Takatau-ku  Kawasaki  213,  JAPAN 


Abstract 

A  COBOL  machine  applicable  to  an  attached  pro¬ 
cessor  has  bean  developed.  It  is  characterised  by 
having  intensive  COBOL  machine  architecture 
(COMBAT) ,  highly-specialised  hardware  structure  and 
compact  and  efficient  host  processor  interface. 

COMBAT  architecture  haa  many  facilities  for 
efficient  COBOL  program  execution!  many  internal 
data,  highly  functional  data  descriptors  and  in¬ 
tensive  instructions.  COMBAT  machine  is  func¬ 
tionally  composed  of  three  processor  modules  (IFPM, 
OFPM  and  EXPM) ,  highly  specialised  for  their  func¬ 
tions. 

It  is  found  that  average  COLOL  statement  ex¬ 
ecution  time  is  35%  of  host  processor  execution 
time.  A  COMBAT  machine  attains  better  coat/per¬ 
formance  and  is  useful  for  a  speclrl  COBOL  procis- 
oor  attached  to  a  medium  or  large-scale  computer. 

Introduction 

Recent  advances  in  solid-state  technology  and 
software  crisis  due  to  increase j  in  computer  appli¬ 
cations  are  accelerating  the  research  and  develop¬ 
ment  ot  high-level  language  machines.  From  the 
viewpoint  of  their  utilization  style,  high-level 
language  machines  are  classified  into  two  catego¬ 
ries:  a  stand-alone  processor  and  an  attached 
processor  or  an  element  processor  of  a  distributed- 
function  computer  systoml.  Burroughs  111  7iJ0 2  and 
NCR  COBOL  Virtual  Machine-  am  typical  examples  ot 
a  stand-alone  high-level  language  machine.  PASCAL 
Microcngins*  from  'destern  Digital  Corp.  is  also  a 
recent  Interesting  product,  applied  to  microcom¬ 
puter  ipplicattons. 

On  the  other  hand,  current,  marked  decreases  in 
the  cost  of  hardware  and  advent  of  highly  function¬ 
al  processor  modules  make  it  not  only  technically 
feasible,  but  ecoromically  practical  to  develop  the 
attached  high-level  language  machine.  Taking  this 
trend  Into  consideration,  a  COBOL  machine  applica¬ 
ble  to  an  attached  processor  has  been  implemented. 

In  order  to  attain  better  cost  performance  in 
a  high-level  language  machine,  machine  architecture 
and  hardware  structure  design,  based  on  ac*  uni  user 
environment  are  important.  For  tins  pur|iose,  an 
analysis  tool  is  implemented.  The  analysis  tool 
gathers  COBOL  user ‘ s' program  profile,  including 
COBOL  verbs,  operand  d-*a  attributes  ami  so  on. 

With  the  help  of  this  tool,  a  COBOL  machine  archi¬ 
tecture,  highly  optimized  for  COBOL  proqram  pro¬ 
cessing,  and  a  COBOL  machine  hardware  structure, 


greatly  specialized  for  its  machine  architecture, 
ere  obtained. 

The  COBOL  machine  can  effectively  execute 
major  COBOL  processing.  However,  input-output  op¬ 
erations,  nowarunication  control,  date  bees  manage¬ 
ment,  software-level  virtual  memory  management  and 
so  on,  are  required  for  a  boat  processor.  There¬ 
fore,  in  a  high-level  language  machine  for  an  at¬ 
tached  processor,  highly  effective,  coaipect  end 
flexible  process  switching  mechanism  between  an 
attached  processor  end  a  host  processor  is  re¬ 
quired.  In  order  to  accomplish  this  function,  af¬ 
fective  connection  interface  at  the  internal  bus 
and  f inwars  level  is  provided. 

COBOL  ueer'e  programs  are  translated  into 
highly  functional  COBOL  machine  inetructiona  by  a 
software  translator,  which  rune  on  a  host  proces¬ 
sor. 

As  an  evaluation  criterion  of  high-level 
language  machine  architecture,  IPF  (Instructions 
Per  Function) ,  which  indicates  how  many  machine 
instructions  correspond  to  a  source  statement,  is 
selected.  IPF  means  machins  architecture  language 
proximity.  In  order  to  evaluate  IPF  value  and 
object  memory  capacity  per  a  COBOL  statement,  an 
evaluation  tool  is  implementsd. 

At  present,  the  COBOL  mschinu  is  lunning  as  a 
processor  attached  to  a  host  processor,  in  which 
a  medium-scale  conventional  commercial  computer 
(NEAC  ACOS  scries  77  Model  300)  Is  used  an  a  bane 
computer.  In  the  host  processor,  therefore, 
FORTRAN,  PL/ I  and  COBOL  program  execution  are  pos¬ 
sible,  as  well  ns  COBOL  proqram  compilation. 

As  a  result  of  this  attachment,  COBOL  program 
execution  in  the  host  processor  is  excluded  for 
the  COBOL  machine.  This  resu'lts  in  host  processor 
performance  enhancement  for  through-put  and  turn¬ 
around-time  . 

In  the  following  sections,  a  COBOL  machine 
architecture,  COMBAT  (COBOL  Oriented  Machine  Basic 
ArchiTecture)  ,  a  machine  hardware  structure,  host, 
processor  interface  and  evaluation  results  are 
doscr ibed , 

System  Overview 

Figure  1  showy  roMHAT  system  eon  I igui ation, 
including  analysis  and  evaluation  tools.  Tile 
COMBAT  system  is  composed  of  COMBAT  translator  and 
COMBAT  machine  connected  to  a  host  prm cssor . 

COBOL  programs  are  translated  into  biqhly 
functional  COMBAT  instructions  by  a  software 
COMBAT  translator,  whose  lanquage  specification  is 


compatible  with  4  hoat  processor  (ANSI  14  COBOL5) 
for  practical  usa  and  impartial  evaluation  of  the 
system.  The  higher  the  functional  level  of  a 
high-level  language  machine  architecture  becomes, 
the  simpler  a  translator  becomes.  A  translator  is 
composed  of  high-level  language  dependent  part  and 
target  machine  dependant  part.  In  the  COMBAT 
translator,  the  processing  time  and  memory  caper  ity 
for  the  latter  part  greatly  reduce  due  to  its  high 
functionality. 
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Fig.  I  COMBAT  Syetria  Configuration  and 
Analysis/evaluation  Tools 

Hlqhly  functional  machine  architecture,  in¬ 
cluding  host  processor  Interface  end  intensive 
hardware  structure,  closely  related  to  machine 
architecture ,  are  required  for  an  attached  high- 
level  language  machine.  The  COMBAT  machine  can 
effectively  perform  melor  COBOL  functions  for  d«ta 
manipulation,  table  handling,  arithmetic  opera¬ 
tions  and  conditional  operations.  Moreover,  high 
performence  and  compact  host  processor  interfeco 
to  enable  execution  of  other  feetures  ere  provided 
to  the  host  proceseor,  c.g.  input-output  opera¬ 
tions,  communication  control  and  virtual  memory 
management  at  the  Software  level,  A  host  proces¬ 
sor  call  instruction  In  COMBAT  machine  realises 
this  function. 

COMBAT  machine  architecture  is  qreatly  opi i- 
mi*cd  for  COBOL,  language  processing  in  order  to 


obtain  high  performance  at  the  machine  architec¬ 
ture  level.  Moat  COBOL  statamants,  therefore,  can 
ba  tranalatad  into  a  aingla  COMBAT  inatruction. 
Various  formats  of  internal  data  directly  corre¬ 
sponding  to  all  user  defined  source  data  are  pro¬ 
vided. 

COMBAT  Mchine  has  a  hardware  structure  spe¬ 
cial  ited  for  COMBAT  architecture,  which  is  mainly 
composed  of  thraa  functionally  distributed  proces¬ 
sor  Modules  (IFPMi  Instruction  Fateh  Processor 
Module,  OFPMt  Operand  Fetch  Processor  Module  and 
EXPM:  Instruction  Execute  Processor  Module). 

Their  processor  modules  are  also  apacialised  for 
their  functions  using  microprogramming  techniques 
and  powerful  hardware  components. 

Archltacture  and  Hardware  Organisation 


Architecture 

A  Cobol  Oriented  Machine  Basic  Archltacture 
(COMBAT  architecture)  has  bean  specified  to  obtain 
better  trade-offs  between  hardware  and  software  in 
high-level  language  processing ,  In  high-level 
language  machines,  it  ia  most  aignificant  to  decide 
how  much  a  gap  ia  reduced  between  a  source  state¬ 
ment  and  a  machine  instruction.  In  order  to  attain 
better  performance,  the  machine  instruction  set  ia 
deflnnd  to  correspond  to  a  COBOL  source  statement 
at  closely  as  possible.  Therefore,  the  following 
functions  are  performed  durlnq  a  machine  instruc¬ 
tion  execution. 

(i)  Data  type  conversion  or  adjuataant. 

(ii)  Indexing  by  index  data  or  subscript  data, 
(til)  Editing  required  for  data  transfer  and 

arithmetic  operations. 

Machine  Instruction  Format.  Most  COBOL 
source  statements  are  tranalatad  into  a  machine 
instruction  by  a  software  translator,  which  corre¬ 
sponds  to  a  conventional  compiler.  A  machine  in¬ 
struction  is  composed  of  operation  coda  and  operand 
syllables,  as  shown  in  Fig.  2.  If  nactssary,  a 
variant  syllable  or  operand  number  syllable  ia  ap¬ 
pended  to  the  operation  code.  Each  operand  ayl- 
lable  represents  a  data  item.  Uhen  the  operand  is 
an  element  in  an  array,  several  operand  syllables 
are  necessary  to  specify  index  or  subscript  date 
items . 


HOVE  A  TO  B(I,J) 


\  'Operand  Syllable 

\  Operand  Number  Syllable 
Operation  coda 

Fiq.  2  Source  statement  and  Machine  Instruction 
Correspondence 
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instruction*.  SEARCH  and  PERFORM  statement  func¬ 
tions  are  also  translated  into  several  instruc¬ 
tions. 


Oats  and  Descriptor.  COBOL  users  can  handla 
various  data  fonts ta  In  a  COBOL  program.  Since 
there  are  only  a  few  data  formats  directly  manipu¬ 
lated  in  a  conventional  machine,  an  object  program 
should  convert  then  into  internal  formats  at  run 
tine,  This  COMBAT  machine  provides  all  data  for-  Hardware  Configuration 

mats  required  in  the  AHSI  74  COBOL  specification. 

Table  1  lists  arithmetic  data  formats  as  an  exam-  The  function  performed  within  the  COMBAT 

pie.  machine  is  higher  than  that  for  conventional  ma- 

Descriptor  architecture  is  adopted  to  facili-  chines.  The  microprogramming  and  pipelined  archi- 
tate  more  complex  data  description  capability  for  tecture  is  suitable  to  effectively  realize  high 

decimal  scaling  and  editing  operations.  Simple  functionality.  Ir.  the  COMBAT  machine,  a  machine 

data  format  operand,  however,  can  be  specified  instruction  execution  is  divided  into  three 

without  a  descriptor  to  avoid  performance  dis-  phases,  instruction  fetch,  operand  fetch  and  ex- 

advantage  due  to  using  the  data  descriptor.  ecution.  Each  phase  is  executed  by  three  inde- 

COBOL  language  allows  the  user  to  describe  pendent  processor  isodulea,  as  shown  in  Fig.  3, 

very  complex  operation  in  a  statement.  If  it  is  Instruction  Fetch  Processor  Module  (1FPM),  Operand 

translated  into  a  single  machine  instruction,  the  Fetch  Processor  Module  (OFPM) ,  and  Instruction 

hardware  design  beoomes  too  complicated.  In  the  Execute  Processor  Module  (EXPM) ,  respectively. 

combat  machine,  complex  statements  are  divided  These  processor  modules  are  connected  with  each 

into  several  basic  operations.  For  example,  other  through  First-In-First-Out  (FIFO)  queue  memo 

EXAMINE  or  INSPECT  statement  functions  are  per-  ries.  This  configuration  is  intended  to.be  imple- 

formed  with  the  combination  of  TALLY  and  REPLACE  merited  with  VLSI  chipa. 


Table  1  Arithmetic  Data  Formats  in  the  COMBAT  Machine 


Data  Format 


COBOL  Usage 


Signed  Binary  Short 

Signed  Binary  Long 

Singed  Packed  Decimal 

Signed  Unpacked  Decimal 

Unsigned  Unpacked  Decimal 

Leading  Signed  Unpacked  Decimal 

Separate  Trailing  signed  Unpacked  Decimal 

Separata  Leading  Signed  Unpacked  Decimal 


COMP-1 
COMP-2 
COMP- 3 

COMP /DISPLAY  (SIGN  IS  TRAILING) 
DISPLAY  (NO  SIGN) 

DISPLAY  (SIGN  IS  LEADING) 

DISPLAY  (SIGH  IS  TRAILING  SEPARATE) 
DISPLAY  (SIGN  18  LEADING  SEPARATE) 


COMBAT  MACHINE 


Fiq.  i  COMBAT  Machine  Kysti'ix  Configuration 
2\t> 


Instruction  Fetch  Processor  Module  (1FPM) . 

The  Min  IFPH  role  in  to  generate  Internal  Intro.-' 
correspondin'!  to  an  instruction  for  easy  following 
manipulations.  'Ihe  operation  code  and  variant  syl¬ 
lable  are  packed  into  a  32-bit  internal  machine 
instruction,  .ir  shown  in  Fiq.  4,  and  t mustered  I  ■ 
OFPM  and  EXPM  through  machine  instruction  FIFOs . 

The  operand  ryllatlet  are  also  packed  into  a  72-bit 
internal  data  descriptor  for  each  operand.  Within 
thia  process,  indexing  and  subscripting  art-  resolved 
and  an  effective  operand  address. is  located  in  the 
Internal  data  descriptor.  Another  important  role 
is  to  control  the  COBOL  program  execution  sequence. 
Normally,  1FPM  continues  prefetching  according  to 
the  sequence  represented  by  such  as  GOTO,  IF  and 
perform  statements. 

Internal  Hachine  Instruction 


0  31 

|  OP 

VAR 

■ 

3 

Intarnal  Data  Descriptor 


0  71 


IN 

ON 

TYPE 

ATTRIBUTE 

LOGICAL  ADDRESS 

Ni  Number  of  operands 
IN i  Instruction  Number 
ON i  operand  Number 

Fig.  4  Internal  Machine  Instruction  and  Datd 
Descriptor  Format 

i >i_i' rand  Fetch  Processor  Module  (OFPM)  .  The 
main  OFPM  role  is  to  prepare  operand  data  for  I'XPtl, 
iriclinlinq  dal. i  i.-lrli,  validity  check  and  data  l-e  • 
mat  con  ve  i  s  Ion .  ol-'PM  fetches  data  from  a  mam 
memory.  Data  contents  are  examined  to  validate 
them.  Then,  detailed  operand  attributes  are  -j.  i 
into  an  internal  data  descriptor.  For  example,  it 
i  a  determined  whethet  data  is  positive,  negative, 
all  S[ see,  zero,  alphabetic  or  numeric.  When  an 
operand  data  in  used  in  an  arithmetic  operation, 
ofpm  converts  -  into  one  of  two  internal  data 
formats,  Sign,  d  binary  Long  or  Unsigned  packer) 
Decimal,  in  order  to  he  easily  manipulated  in  EXpM. 

Instruction  Execute  Processor  Module  IF.XPMI  . 
FXPM  performs  instruction  execution  as  a  final 
stage  in  a  pipeline.  To  achieve  hiqh  performance, 
FXPM  Installs  specially  designed  hardware  units 
Especially  transfer  and  edltlnq  operations  are 
performed  eftc.tivily  with  the  aid  of  those  -ipe,  ml 
hardware  units,  because  these  operations  nr--  most 
frequently  used  in  COBOL  programs. 

These  processor  nodules  are  composed  of 
bipolar  bit-slice  sequencers  (AMP  29U(l  series). 
Their  Instruction  cycle  time  is  200  nsec.  1FPM 
and  OFPM  micro  instruction  lenqth  is  48  bits  and 
EXPM  is  72  bits  long.  Control  storage  sizes  for 
JFPM,  OFPM  and  EXPM  are  IX,  2K,  3K  words,  respec¬ 
tively.  IFPM  and  OFPM  are  implemented  with  37  and 
42  boards,  on  which  a  maximum  of  80  ICs  can  bo 
installed.  An  EXPM  is  Implemented  with  25  boards, 
which  can  install  a  maximum  of  200  ICs. 


Host  Processor  Interface 

The  system  is  composed  of  the  COMBAT  machine 
and  a  host  processor,  as  shown  in  Fiq.  3.  In  this 
section,  the  Interface  between  these  two  proces¬ 
sor.';  iu  described.  A  ('Olini,  source  program  must  be 
translated  into  a  COMBAT  object  program,  before 
the  program  is  processed  on  the  COMBAT  machine  . 

The  COMBAT  machine  executes  COBOL  language  pro¬ 
cessing  functions  independently  from  the  host  pro¬ 
cessor.  Hie  host  processor  is  responsible  for 
this  translation  and  also  for  miscellaneous  func¬ 
tions.  For  example,  I/O  statements  (0PEN/CU3SE/ 
DISPLAY) ,  inter  program  control  statements  (CALL/ 
EXIT  PROGRAM)  and  communication  control  statements 
(SEND/RECEIVE)  are  categorised  as  such  functions. 
These  statements  are  translated  into  HOST  CALL 
inst i notions  by  the  translator. 

The  COMBAT  machine  is  physically  connected  to 
a  ho-. I  priii-c'iKot  by  a  -.hared  main-memory  interface 
and  a  dialed  bus  interface.  Data  and  program  code 
an-  i.  i".  ed  by  two  prncutisoi  s  through  the  shared 
mn i ri-memoi y  interface.  Control  signals  are  trana- 
fen-d  tliiough  the  shared  bus  interface. 


Shared  Main-Memory  Inter  fare 

Mam-memory  can  be  accessed  by  both  the 
COMBAT  machine  and  the  host  processor.  Data  and 
programs  are  located  on  a  host  virtual  storage 
space  as  a  unit  of  segment.  Therefore,  it  is 
necessary  to  translate  the  virtual  address  into  a 
real  memory  address,  every  time  a  segment  is  ac¬ 
cessed.  The  COMBAT  machine  has  sn  address  trans¬ 
lation  mechanism  called  Memory  Processor.  For 
high  -.peed  translation,  the  Memory  Processor  has  8 
pair-,  el  virtual  and  n  il  address  mapplnq  registers 
m  i  on  iuti< - 1  ion  with  an  associative  memory  device, 
once  a  segment  is  accessed  and  the  address  trans¬ 
lation  has  been  performed,  the  address  tupping 
registers  contents  are  effective,  as  long  as  the 
segment  stays  at  a  certain  real  memory  location. 

When  the  segments  have  been  relocated  by  Vir¬ 
tual  Memory  Manager  (VMMi  runs  on  a  host  proces¬ 
sor),  thr  address  mapping  registers  contents  must 
lie  ■  bared.  Moreover,  when  the  COMBAT  machine 

. . .  i  segment  which  is  not  in  the  main-memory, 

tin-  segment  must  be  moved  to  the  main-memory  from 
the  secondary  storage.  The  host  processor  executes 
tin  r  function  for  the  COMBAT  machine  (VMM  CALL). 


.■fra i  Bus  Inter  f act* 

Host  proi-i-usor  bun  is  directly  connected  to 
tin-  COMBAT  machine.  To  control  the  bus,  s  special 
host  machine  instruction,  named  SUPERVISE  COMBAT, 
is  '-rovided  in  the  host  processor.  This  instruc¬ 
tion  is  developed  into  a  host  micro-code,  called 
COMBAT  Support  Firmware.  Its  process  flow  is 
shown  in  Fiq,  5.  Under  the  control  of  the  COMBAT 
Support  Firmware,  information  can  be  transfered  to 
and  from  the  COMBAT  machine  through  the  bus. 
Therefore,  transfering  is  possible  if,  end  only 
if,  the  host  processor  is  executing  the  SUPERVISE 
COMBAT  instruction. 
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^  ENTRY  ^ 


halt  COMBAT 

MICROPROGRAM 

BRANCH 


halt  COMBAT 
(HOST  CAM.) 


At  the  beginning  of  a  COBOL  program  process, 
OMBAT  Support  Program  prepares  the  execution  en¬ 
vironment.  Segment  addresses  are  loaded  in  base 
registers  (BRo)  and  other  ini  urination  in  general 
registers  (GRs) .  Especially,  BR4  is  set  to  the  top 
address  of  the  COMBAT  code  segment,  and  OKU  con¬ 
tents  are  cleared.  GRO  is  used  as  a  flag  t6  con¬ 
trol  the  COMBAT  Support  Firmware  execution. 

Then,  a  host  processor  executes  the  SUPERVISE 
COMBAT  instruction,  that  is,  the  COMBAT  Support 
Firmware  runs.  COMBAT  Support  Firmware  generates 
an  initialization  signal  to  tha  COMBAT  machine,  and 
transfers  the  segments'  information  through  the 
shared  bus  interface.  The  BR4  contents  are  trans- 
fered  to  the  COMBAT  machine's  instruction  counter. 


halt  COMBAT 

MICROPROGRAM 

BRANCH 


EXIT 


D 


Fig.  5  COMBAT  Support  Firmware 


COMBAT  Machine  and  Host  Processor  Interaction 

A  translator  program, runs  on  a  host  processor, 
generates  four  kinds  of  seqments.  Three  of  them 
are  mainly  accesaad  by  the  COMBAT  machine:  COMBAT 
object  code,  COMBAT  descriptor  and  COMBAT  data 
segment.  The  other  kind  is  a  host  object  code 
segment  called  COMBAT  Support  Program.  It  includes 
the  SUPERVISE  COMBAT  Instruction  and  other  codes 
for  the  execution  of  functions  to  be  processed  on 
the  host  processor  mentioned  above.  The  COMBAT 
Support  Program  structure  is  Bhown  in  Fig.  6. 
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Fig.  6  COMBAT  Support  Program 


Then,  the  COMBAT  machine  starts  to  fetch  the  COMBAT 
instructions  and  descriptors  from  the  segments, 
prepared  for  the  COMBAT  machine,  through  the  shared 
main-memory  interface.  After  that,  COMBAT  Support 
Firmware  enters  a  microprogram  loop  until  an  inter¬ 
ruption  condition  occurs,  eithar  on  the  COMBAT 
machine  or  on  the  host  processor. 

Mtien  an  interruption  condition  occurs,  the 
COMBAT  Support  Firmware  halts  its  supervising  pro¬ 
csss.  At  this  time,  the  COMBAT  machine  cannot  ac¬ 
cess  a  new  segment  or  generate  a  new  HOST  CALL 
instruction  to  a  host  processor,  until  the  COMBAT 
support  Firmware  reetarts  Its  process.  Howevtr, 
other  processes  lnsids  the  COMBAT  machine  can  be 
executed  continuously. 

A  host  interruption  causes  a  microprogram 
branch  to  an  interruption  process  part.  After  in¬ 
terruption  process  completion,  microprogram  control 
returns  to  the  COMBAT  Support  Firmware  and  restarts 
the  COMBAT  supervising  process,  or  dispatches  to 
another  host  procsss.  In  the  latter  case,  contents 
for  base  rsgistars  and  general  reglstsrs,  related 
to  the  COMBAT  execution,  must  be  saved.  This  pro¬ 
cess  is  necessary  for  multi -programing  control. 

COMBAT  machine  brings  about  an  interruption  in 
two  cases.  One  is  when  the  COMBAT  machine  requires 
access  to  a  segment  which  is  not  in  the  main- 
memory.  In  this  casa,  the  COMBAT  support  Firmware 
stops  its  supervising  process  and  calls  a  host 
Virtual  Mstaory  Manager  software  routine  to  move  the 
segment  to  the  main-memory  from  secondary  storage. 
The  other  interruption  occurs  when  the  COMBAT 
machine  encounters  a  HOST  CALL  instruction  in  the 
COMBAT  code  eegswnt.  This  time,  the  COMBAT  support 
Firmware  completes  its  execution,  and  the  COMBAT 
Sdpport  Program  takes  a  host  machine  cycle. 

Nest  to  the  SUPERVISE  COMBAT  Instruction  in 
the  COMBAT  Support  program  la  an  analysis  routine 
for  the  HOST  CALL  parameter.  The  parameter  is 
fe'tched  from  the  COMBAT  coda  ssgswnt  through  the 
shared  main-memory  interface.  According  to  the 
analysis  result,  COMBAT  Support  Program  executes 
one  of  the  functions  to  be  executed  by  the  host 
processor  described  before,  e.g.  EXIT  program, 

SEND,  DISPLAY,  etc.  After  the  HOST  CALL  instruc¬ 
tion  execution,  the  COMBAT  Support  Prog.aa  sets 
the  GRO  and  executes  the  SUPERVISE  COMBAT  instruc¬ 
tion  again.  Oetecting  that  the  vaiue  in  GRO  is  not 
equal  to  zero,  the  COMBAT  Support  Firmware  skips 
the  Initiation  phase  and  continues  its  supervising 
process.  If  the  HOST  CALL  instruction  was  a  STOP 
RUN  or  ERROR  instruction,  COMBAT  Support  Program 
•tops  its  execution. 

Evaluation  Results 

The  COMBAT  system  is  evaluated  from  the  as¬ 
pects  of  translation  from  a  COBOL  program  to  COMBAT 
machine  instructions  and  their  execution.  For  this 
purpose,  COMBAT  translator  and  COMBAT  machine  ex¬ 
ecution  are  compared  with  the  host  COBOL  compiler 
and  host  processor  instruction  execution,  respec¬ 
tively.  In  order  to  clarify  the  effect  of  an  at¬ 
tached  high-level  Xangueqe  machine,  an  attempt  was 
made  to  determine  how  much  work  load  is  excludsd 
from  the  host  processor. 

As  an  evaluation  maamira  at  the  COBOL  program 
level,  five  typical  user  program#  were  chosen. 

Also,  for  COBOL  statement  level  evaluation,  a  COBOL 


statement  mix,  consisting  of  15  typical  COBOL 
statements,  was  selected,  based  on  actual  user  ap¬ 
plication  programs. 

Translator  Evaluation 

In  ordar  to  clarify  the  difference  between 
COMBAT  and  host  machine  architectures,  instruc¬ 
tions  per  function  (IPF)  have  been  measured. 

COMBAT  machine  and  host  processor  IPFs  for  the 
statement  mix  are  1.7  and  5.5,  respectively.  These 
values  show  remarkable  COMBAT  architecture  proximi¬ 
ty  to  COBOL  source  statesients.  The  COMBAT  archi¬ 
tecture  brings  the  following  effects  on  COMBAT 
translator. 

•  Translator  program  memory  reduction 
■  Decrease  in  translation  time 
object  program  memory  reduction 

Improvement  degree  for  these  effects  is  in¬ 
fluenced  by  translation  processor  unit,  machine 
architecture,  translator  description  language  and 
translator  desiqn  algorithm.  In  ordar  to  evaluate 
the  difference  between  COMBAT  end  boat  machine 
architectures,  COMBAT  translator  ia  composed  In  the 
same  way  as  the  host  compiler,  except  for  the  coda 
generation  phase.  These  effects  are  evaluated  with 
five  COBOL  uaar  programs,  collected  from  various 
application  areas.  Table  2  shows  the  results  of 
the  COMBAT  translator  performance,  compared  with 
the  host  compiler. 

Table  2  COMBAT  Translator  Performance 


Both  COMBAT  translator  and  the  host  coswilar 
are  divided  into  pre-code  generation  part  and  code 
generetion  part.  The  pre-code  generation  part 
design  is  dapandsnt  on  the  source  language  and  in¬ 
dependent  of  the  object  machine  architecture.  On 
the  other  hand,  the  coda  generation  part  design 
depends  on  the  object  Mchine. 

Translator  Program  Capacity.  The  instruc¬ 
tions  par  function  for  ths  COMBAT  architecture  is 
markedly  reduced.  Therefore,  the  code  generation 
part  capacity  is  19«  lass  than  the  host  part  ca¬ 
pacity.  in  spite  of  preparing  unique  functions  for 
COMBAT  architecture.  The  unique  functions  include 
generation  of  data  descriptors,  multi-operand 
instructions  and  host  processor  codes.  The  COMBAT 
translator  pre-code  generation  part  memory  capacity 
is  alsnst  the  tamo  as  that  for  the  host  compiler, 
because  of  their  source  language  dependency  and 
object  machine  architecture  independency.  Total 
memory  capacity  in  COMBAT  translator  becomes  6% 
lass  than  the  host  compiler. 
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Translation  Time.  COMBAT  tiatmlator  cxecu- 
tion  time  is  measured  with  software  monitor  and 
compared  with  the  host  compiler,  as  shown  in  Table 
2.  COMBAT  translation  time,  in  code  generation 
part  and  whole  translator,  reduce  to  GG%  and  92%, 
respectively . 

Object  Program  Capacity.  COMBAT  object  pro¬ 
gram  capacity  reduces  to  59%  of  the  Boat  object 
program,  as  shown  in  Table  2.  Object  program  ca¬ 
pacity  effects  the  performance  in  executing  the 
object  program*  from  the  effective  memory  use 
aspect.  This  memory  reduction  brings  about  good 
effect  on  program  locality. 

Execution  Time  Evaluation 


lull'll  execution  in  each  processor  is  louli/.od  l.y 
special  hardware  units,  consisting  ot  high  spued 
Minister  files,  progranwiablc  logic  arrays,  etc. 

COMBAT  performance  improvement  in  the  COBOL 
statement  mix,  in  which  most  statements  are  veiy 
simple,  is  limited  by  memory  access  operation,  as 
shown  in  the  memory  usage  ratio.  Highly  functional 
COMBAT  architecture  and  extensive  COMBAT  hardware 
are  not  sufficiently  utilized  in  this  situation. 

On  the  other  hand,  a  COMBAT  machine  han  highly  ef¬ 
ficient  machine  instructions  for  complex  COBOL 
statements,  ’STRING1  and  1  INSPECT’  and  for  complex 
data  attribute  manipulation,  like  a  subscript  and 
decimal  point  scaling.  Therefore,  COMBAT  perform- 
.nice  improvement  becomes  larger  for  these  complex 
-!  ta  temunts . 


Average  Statement  Execution  Time.  The  aver¬ 
age  statement  execution  times  in  COMBAT  machine 
and  the  host  processor  are  evaluated.  Memory  ac¬ 
cess  time  and  memory  usage  ratio  are  also  evaluat¬ 
ed.  These  evaluation  results  are  shown  in  Taloe  1 


Table  3  Execution  Performance  for  Statement  Mix 


COMAT 

Host 

Average  statement  Execution  Time 

0.15 

1.00 

Ntnory  Ueage  Ratio 

70% 

40% 

Memory  Aooeea  Tima 

O.SO 

1.00 

COMBAT  average  statement  execution  time  be¬ 
comes  one  third  in  comparison  witn  the  host  aver¬ 
age  statement  execution  time.  The  major  ruaaons 
for  this  COMBAT  machine  performance  improvement  are 
considered  as  beinqi 

1)  Machine  architecture 

Highly  efficient  COMBAT  architecture  leads  to 
less  instruction  fetching  and  data  accessing  opera¬ 
tions,  due  to  compact  object  code,  as  shown  in 
Table  2,  For  example,  most  literal  data  are  di¬ 
rectly  described  within  the  instruction  and  sub¬ 
scripted  data  address  is  calculated  using  a  data 
descriptor.  As  a  result,  literal  and  subscripted 
data  are  efficiently  accessed. 

2)  Hardware  configuration 

Memory  access  time  from  each  COMBAT  processor 
becomes  longer  than  that  from  the  host  processor. 

In  order  to  improve  COMBAT  memory  access  time,  a 
cache  memory  is  provided.  Memory  access  time  from 
the  COMBAT  machine  with  the  cache  memory  reduces  to 
B0\  of  that  from  the  host  processor,  as  shown  in 
Table  3.  In  spite  of  this  memory  access  improve¬ 
ment,  the  COMBAT  memory  access  ratio,  70%,  is  high¬ 
er  than  the  host  processor  memory  usage  ratio,  40%. 
This  high  usage  ratio  is  accomplished  due  to  COMBAT 
machine  pipeline  configuration,  in  which  each  pro¬ 
cessor  independently  generates  memory  requests  and 
rapidly  executes  each  part  of  a  COMBAT  instruction. 


Application  Program  Execution  Time,  In  order 
to  make  a  program-level  evalution,  COMBAT  machine 
execution  times  for  application  programs,  including 
input/output  and  other  exclusive  operations,  are 
compared  with  the  host  processor  execution  times. 

In  addition,  for  clarification  of  effects  due  to 
the  attached  COBOL  machine,  through-put  and  turn¬ 
around  time  improvements,  for  application  programs 
in  the  host  processor,  are  measured. 


Conclusion 


A  COBOL  machine  architecture  (COMBAT  architec- 
i hi e 1 ,  a  COBOL  machine  hardware  structure  (COMBAT 
machine)  and  several  evaluation  results  have  been 
presented.  The  COMBAT  architecture  and  COMBAT 
machine  structure  are  specified  to  become  optimum 
from  both  machine  architecture  and  hardware  design 
sides.  The  COMBAT  architecture  is  highly  optimized 
for  COBOL  program  processing.  The  COMBAT  machine 
is  greatly  specialized  for  the  COMBAT  architecture. 

In  addition*  the  COMBAT  machine  is  aimed  to  be 
mainly  a  COBOL  machine,  attached  to  a  host  proces- 
soi.  Therefore,  effective  and  compact  host  pro¬ 
cessor  interface  is  provided. 

As  a  result  of  architecture  optimization  and 
hardware  specialization,  a  highly  efficient  and  low 
cost  COBOL  machine  was  obtained.  Moreover,  simpler 
and  higher  performance  software  translator  than  a 
conventional  compiler  was  attained,  due  to  high 
COMBAT  architecture  functionality. 

It  was  found  that  tile  COMBAT  machine  is  useful 
lor  a  special  COBOL  processor  attached  to  a  medium 
or  large-scale  commercial  computer.  In  addition, 
tlie  COMBAT  machine  is  applicable  to  use  as  an  ele¬ 
ment  processor  for  a  distributed-function  computer 
sys  tem. 
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Abstract 

This  paper  is  divided  into  two 
parte.  In  first  part,  using  the  method 
of  recursive  definition,  we  describe  the 
machine  Language  and  give  proper  expla¬ 
nations.  In  second  part,  we  briefly 
discuss  the  main  parts  of  implementation 
of  this  machine  language.  Wfe  don't  at¬ 
tempt  to  usa  this  machine  language  for 
replacement  of  concrete  design  of  ths 
computers,  and  only  in  principle,  give 
a  diacussion  of  the  unite  which  must  be 
altered  to  match  this  machine  language. 
Since  time  and  apace  are  Halted,  it  ia 
only  a  brief  diacussion.  The  first  part 
can  be  referenced  by  users  and  the 
second  pert  can  be  referenced  while 
designing  machines. 
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Introduction 

Generally,  the  languages  inside  the 
machines  are  different  from  those  outside 
the  machines.  The  internal  languages  pay 
more  attentions  to  considering  enginee¬ 
ring  factors,  for  example,  having  power¬ 
ful  capability  of  eolwing  problems,  sa¬ 
ving  devices  and  ao  on.  The  external 
languages  are  to  consider  the  racility  for 
use.  for  instance,  ALGOL  ia  used  to  pro¬ 
vide  the  facility  for  scientific  computa¬ 
tion  users  and  COBOL  is  used  to  provide 
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the  facility  for  the  commercial  users;. 

Heturelly,  people  can  consider, 
firstly,  whether  we  can  slightly  modify 
common  aachina  language  (e.g.  the  lan¬ 
guage  used  in  the  computer  104}  to 
bring  the  further  facility  for  the  users, 
and  secondly,  whether  we  can  properly 
modify  coaaon  external  language  (e.g. 
ALGOL)  to  execute  it  directly  by  coaputer 
without  adding  too  aany  devices. 

In  this  paper  we  chiefly  discuss  the 
second  point,  and  the  first  point  is 
discussed  in  the  paper  "A  machine  lan¬ 
guage  ridding  of  the  dependency  of  ad¬ 
dress.' 

All  of  the  discussions  are  preli¬ 
minary  and  speclallatic  and  are  not  for 
being  used  directly  in  the  computers. 

We  can  imagine  that  we  make  a  line  to 
link  two  terminale-One  is  general 
machine  language  and  another  la  ALGOL, 
the  flret  point  being  near  the;  first 
-temlnal  and  the  second  point  near  the 
second  terminal.  For  a  specified 
machine,  there  are  a  lot  of  points 
(l.n.  schemes)  to  be  chosen,  we  must 
choose  it  according  to  concrete  condi¬ 
tions.  For  example,  the  price  of  the 
components  is  very  low,  the  reliability 
of  the  components  is  very  high  and  there 
is  a  associative  memory  and  so  on.  All 
of  these  should  be  considered  as  engi¬ 
neering  technical  conditions.  ALGOL  can 
be  chosen  as  a  machine  language  for  a 
special  scientific  computation  machine. 

So  called  computer  design,  essen¬ 
tially,  la  to  choose  schemes  based  on 
considering  special  use  requirments  and 
technical  conditions.  Whether  the  scheme 
is  good  or  not  depends  not  only  on  the 
rightness  of  the  choice  but  also  on  the 
size  of  the  choice  set.  In  this  paper, 
we  dlecuss  three  sorts  of  expressions 
instead  of  one  sort.  For  a  particular 


machine,  people  can  arbitrarily  choose  one 
sort,  two  sorts.  Or  all  the  three  aorta. 
After  we  have  theae  three  sorts  of  syn¬ 
thetic  schemes,  we  can  make  it  easy  to 
determine  which  one  we  prefer  among  the 
seven  possible  schemes. 


*  This  paper  wae  published  in  CHINA 
in  1863. 
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ABSTRACT 

Thts  paper  proposes  a  Tagged  Archltor"  :o  lor  PASCAL 
oriented  computer  architecture.  Ail  variables  are 
associated  to  Variable  Descriptors  and  all  data  ty¬ 
pes  are  described  by  Type  Descriptors.  The  propo¬ 
sed  instruction  aet  is  directly  defined  from  HLL 
statements,  ordering  the  expressions  In  a  Polish 
form  and  keeping  inside  the  computer  the  control 
structure  defined  by  PASCAL.  A  hardware  computer 
is  next  presented  which  executes  the  above  code  by 
means  of  five  specialized  microprogrammed  processors 
working  in  a  pipelined  manner. 


INDEX  TERMS 

High  Level  Language  computer  architecture,  Tagged 
architecture,  Self-identifying  data,  Polish  nota¬ 
tion,  Pipelined  execution. 

INTRODUCTION 

A  first  section  in  this  paper  presents  a  new  ap¬ 
proach  for  the  definition  of  an  instruction  set  and 
data  representation:  it  is  based  on  the  principles 
of  tagged  architecture.  The  second  section  briefly 
presents  the  currently  built  PASC-HLL  computer  that 
executes  the  machine  language  presented  in  Section  1 


A  PASCAL  prociranmor  is  allowed  to  "define"  his  own 
data  types  (so-called  Software  types)  by  structuring 
basic  types  (so-called  Hardware  typeB) .  Next  he  may 
"declare",  inside  each  procedure,  a  set  of  local  va¬ 
riables.  Finally  he  writes  his  program  as  a  struc¬ 
tured  sequence  of  PASCAL  instructions  manipulating 
his  variables.  Such  a  simple  description  of  PASCAL 
programming  directly  leads  to  a  simple  architecture 
for  a  PASCAL  oriented  computer  architecture  i  its 
instruction  set,  so-called  PASC-HLL,  can  be  reduced 
to  a  manipulation  of  the  progranuer-de fined  variables 
and  the  internal  operations  can  be  directed  by  the 
programroer-def ined  data  types.  Such  an  approach  is 
that  followed  by  K. JENSEN  when  defining  the  P.Code 
(a  Pseudo-Code  for  an  hypothetical  stack  computer 
HI).  Several  implementation  of  P.  code  interpret 
ters  are  available  on  mini  or  microcomputers,  but 
none  of  them  really  implements  the  original  P.Code 
which  is  based  on  a  TAGGED  architecture.  Moreover 
P.Code  is  rather  far  from  PASCAL  for  its  control 
instructions:  the  original  PASCAL  syntax  no  more 
exists  in  P.Code.  So  we  propose  to  keep  PASCAL  con¬ 
trol  structures  in  PASC-HLL  to  simplify  debugging 
and  introduce  a  new  kind  of  software  reliability, 
by  providing  a  computer  that"knows"  an  expresion, 
an  ll.-Then~El.se  or  control  loop  structure  ;  it  is 
a  Syntax-oriented  architecture  flQ]. 

II  -  THE  PASC-HLL  DATA  STRUCTURES 
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SECTION  I  -  PASC-HLL  LANGUAGE  DEFINITION 


I  -  INTRODUCTION 

Classical  machines  are  working  on  typeless  data 
considered  as  a  collection  cf  binary  digits  struc¬ 
tured  as  bytes  or  words.  Except  for  Integers  or 
characters,  there  is  no  direct  relation  between  the 
H.L.L.  data  types  and  the  hardware  types  (the  ones 
known  by  the  machine) .  It  is  clear  that  data  type 
definition  is  the  most  Interesting  characteristics 
of  PASCAL  language:  so  it  seems  to  be  important  to 
emphasize  the  problem  of  "making  the  hardware  suit 
the  language,  i.e.  to  define  hardware  data  type., 
that  suit  the  PASCAL  ones.  Moreover,  we  try  to  de¬ 
fine  an  Instruction  Set  which  suits  PASCAL,  in  the 
way  that  it  could  be  the  simplest  and  the  most  com¬ 
pact  executable  code  compiled  from  PASCAL,  keeping 
its  structured  programming  feature  inside  the  ma¬ 
chine  itself.  _  _ 

*  Project  supported  by  French  contract  SESORI  n“78-204 
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Using  the  principles  of  self-identifying  representa¬ 
tion,  we  propose  a  TAGGED  architecture  [2]  with  a 
basic  entity  :  the  Variable  Descriptor  V.D.  associa¬ 
ted  to  »«ich  declared  variable.  When  accessing  a 
V.D. ,  the  machine  roust  get  enough  information  to  per¬ 
form  an  operation  specified  by  the  programmer.  A 
fixed  format  was  chosen  to  match  with  either  16  bits 
or  12  bits  words  and  to  simplify  V.D.  addressing  i 

V.D.3  (B  -  bit  TAG, 8  -  bit  STYPE,  16  -  bit  SVALUE) 

TAG  =  (I  Bit,  P  bit,  2  -  bit  S,  4  -  bit  HT) . 

1 I . 1 .  Description  of  the  variable  descriptor  format 

a/  V.D. 
program 


is  the  basic  information  referenced  by  the 
it  allows  the  machine  to  get  all  informa¬ 
tion  about  the  associated  variable.  Field  TAG  givrs 
all  the  hardware  description  :  firstly,  the  Hardware 
Type  is  ancoded  in  4  -  bit  HT,  indicating  one  among 
16  basic  types  known  by  the  machine  (they  are  listed 
in  Table  1).  If  the  variable  is  a  PASCAL  pointer, 
than  bit  P  is  set.  Bit  I  indicates  wether  the  value 
of  the  variable  is  present  in  field  SV  or  not  i  if 


. ;i. 


not,  field  SV  holds  an  addrats  relative  to  a  segment 
whoa*  n unbar  Is  qlven  by  flald  S,  according  to 
Tabla  2. 

to/  Fisld  ST  (Software  Typas)  holds  an  indax  in  the 
Software  Type  Tabla  whara  -.11  the  programmer  defi¬ 
ned  typas  era  dsscribad,  when  ST  is  ZERO,  thare  is 
no  Software  Type,  to  the  variable  type  is  ths  hard¬ 
ware  one,  for  axaapia  an  8  -  bit  integer  with  hard¬ 
ware  bounds  (-128>  +127).  An  axaapia  of  software 
type  descriptor  is  given  in  fig.  1,  showing  an  ARRAY 
type  descriptor  holding  Lower  and  Upper  Bounds,  Ele¬ 
ment  Sise,  Element  TAG  and  STYPE  and  finally  the 
Array  Site. 

All  these  informations  are  pointed  to  by  the  ST 
field  of  an  ARRAY  variable  whose  access  allows  the 
machine  to  compute  different  operations  depending 
on  the  operator.  As  an  example,  Bound  Checking, 
address  calculation  and  building  of  a  Variable  Des¬ 
criptor  for  the  Indexed  element  for  the  INDEX 
operator  : 

is  LB  S  Index  s  UB  7 

if  yes  then  t  SV  i •  SV (indsx-LB) *STEP 
TAG  i»  Element  TAG 
ST  :«  Element  ST 

c/  Field  SV  (Software  Value)  holds  either  the  value 
of  the  variable  (if  it  aan  be  represented  by  1C  bits) 
or  the  address  of  it  in  the  other  cases i  i.e.  for 
long  values,  for  indirect  values  necessary  before 
an  assignment  or  for  structured  types  (arrays,  re¬ 
cords  or  files) .  A  particular  case  is  that  of  PASCAL 
pointers i  their  value  is  an  address,  so  bit  P  is 
set,  and  bit  I  is  set  or  not  depending  on  wether  the 
pointer  value  is  present  or  not  in  sv. 

II. 2.  The  PASC-HLL  stack  mechanism 

Since  PASCAL  is  a  block-structured  language,  the 
PASC-HLL  machine  requires  to  have  a  stack  mechanism 
for  nesting  procedure  during  execution,  if  a  BASE 
register  is  associated  to  each  Lexical  Level,  it  is 
well-known  that  the  Internal  Nairn  of  any  variable 
can  be  built  as  a  couple  (Lexical  Level,  Displace¬ 
ment)  ■  (LL,D) ,  and  that  this  name  can  be  used  du¬ 
ring  execution  to  access  the  Variable  Descriptors. 
Previous  implementations  of  that  structure  are  well- 
known  (Burroughs  [3],  HUrS  [4],  etc...).  However, 

It  is  important  to  note  that  parameters  must  be 
considered  as  local  variablea  inside  a  called  pro¬ 
cedure,  but  they  oust  ba  avaluatad  in  tha  contaxt 
of  the  calling  procedura.  So  ws  dafins  two  stops : 
firstly  a  Procedure  Variable  Descriptor  is  sccassed 
by  a  CALL  (LL,D)  instruction  which  allows  ths  machi¬ 
ne  to  fetch  and  store  the  Formal  Parameter  Descrip¬ 
tors.  Next  actual  paraatatara  can  ba  evaluated  and 
assigned,  after  s  possible  conversion.  Finally, 
another  Instruction  so-called  ENTER  can  check  that 
the  correct  number  of  parameters  was  assigned,  next 
it  computes  the  Hark  Stack  Control  Word  [3]  and 
fetchea  the  Local  Variable  Descriptor*  before  ente¬ 
ring  the  procedura  code.  Fig. 2  describes  tha  stack 
mechanism. 

Such  a  structure  allows  a  simple  and  compact  addres¬ 
sing  mechanism:  a  positive  displacement  (00.. 63)  for 
local  variables  and  a  negative  one  (-64.. -3)  for  pa¬ 
rameters  is  associstad  to  ths  Lexical  Level  to  fora 


ths  Variable  Internal  Name.  Fig. 3  gives  tha  VIN 
encoding. 

it  is  now  possible  to  define  the  Access  Instructions 
whose  operand  la  a  variable  Name:  a  6-bit  opcode  is 
associated  to  a  10-bit  Variable  Namefor  4  basic 
instructions:  REF  and  1NDD  allows  ths  ms  chins  to 
secass  s  Variable  Descriptor  (with  an  Indirection  in 
the  case  of  XNDD) ,  ASSIGN  asks  tha  machine  to  assign 
s  new  value  to  the  variable,  and  CALL  allows  tha 
sccass  to  a  Procedure  Variable  Descriptor.  Other 
miscellaneous  access  instructions  era  CLEAR,  SET, 

1NCR,  DECK  to  lmplsawnt  frequently  used  operations 
on  variables  such  as  I:  «  1+1  or  1:  ■  0  (see  table  3). 
Now,  ws  can  show  ths  simple  I-PASCAL  language. 


HI  *  THE  PASC-HLL  LANGUAGE  STRUCTURE 

We  have  just  presented  access  instructions  and  as- 
signoenta.  Now  It  is  time  to  say  that  we  choose 
ths  PASCAL  expression  to  be  translated  Into  POLIJH 
form  expressions,  and  that  the  PASCAL  control  Ins¬ 
tructions  will  be  translated  Into  equivalent  PASC- 
HLL  control  instructions. 

it  is  clear  that  the  PASC-HLL  program  will  have  the 
same  structure  as  the  PASCAL  program  It  has  been 
translated  from.  An  example  ia  given  In  Fig. 4  which 
shows  ths  equivalence:  PASC-HLL  offers  the  earns 
structured  progressing  facilities  as  PASCAL,  needing 
tha  machine  to  base  Its  control  structure  on  ths 
principles  of  control  segments  defined  by  e  couple: 
(Entry  Address,  Return  Address) .  Inside  e  control 
segment,  the  Program  counter  PC  la  Inert  anted,  but 
syntactic  rules  must  be  eetlefled:  a  Polish  fora  axr 
presalon  must  be  coagulated  before  an  ASSIGN  Instruc¬ 
tion  is  fetched  and  an  expression  cannot  start  with 
an  operator,  an  INDEX  operator  cannot  be  applied  on 
any  operand:  its  first  operand  must  be  an  ARMY  the 
second  one  must  be  a  SCALAR. 


IV  *  the  pasc-hll  segments 

Compiling  s  PASCAL  program  generates  a  PASCAL -HLL 
coda  segment  holding  a  Types  Descriptor  Table,  a 
Constant  Tabla  and,  for  each  procedure,  a  couple  of 
two  elemental  firstly  the  formal  parameters  and  lo¬ 
cal  variables  descriptors,  secondly  tha  executable 
cods.  An  slssMnt  number  is  associated  to  each  pro¬ 
cedure:  it  is  considered  as  the  "value''  of  tha  pro¬ 
cedure  variable,  lneide  its  Variable  Descriptor. 

During  execution,  the  PASC-HLL  machine  needs  to 
access  to  other  segments:  the  first  one,  so-oalled 
Context,  holds  ths  nested  procedure  pointed  to  by 
the  BASE  registers,  and  tha  second  one,  so-called 
Dynamic,  is  used  for  dynamic  allocation  end  accessed 
by  means  of  "pointers".  The  Code  S absent  may  ba  du¬ 
plicated  as  an  External  Code  Segment  holding  "eyetam" 
or  "library"  proceduree. 

So  the  PASC-HLL  machine  knows  four  different  segments: 
that  feature  allows  it  to  manipulate  relative  short 
addresses  which  can  be  translated  to  be corns  absolute 
addresses,  it  is  than  possible  to  have  truely  re¬ 
entrant  coda  and  data,  and  iamedlat  protection  bet¬ 
ween  all  ths  segments. 
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SECTION  2  -  PKECTNTATION  OF  THE  PASC-HLL  COMPUTER 

I  -  IHTBODOCTXOW 

After  the  above  presentation  of  the  PASC-HLL  lan¬ 
guage  that  was  defined  from  the  language  PASCAL, 
we  propose  a  hardware  computer  to  execute  that  ma¬ 
chine-language.  It  is  the  Pipelined  Architecture 
Slice  Computer  for  High-Level  Language  ,  so-called 
PASC-HLL  computer,  currently  built  using  AM  2900 
family  [5], 

Pipelined  architecture  ia  characterised  by  a  high 
degree  of  parallel  operations  concurrantly  running 
inside  the  computer  [6].  Seceral  attempts  have  bean 
made  to  build  high  performance  pipelined  computers 
[7][8]  :  their  efficiency  depends  on  the  way  they 
are  programmed  and  requires  a  lot  of  pra-procassing 
for  generating  optimised  Code.  The  PASC-HLL  computer 
is  ohnraptmrlsed  by  a  new  approach,  based  on  a 
natural  decomposition  of  the  work  to  be  executed, as 
explained  in  [9]:  "a  Pipeline  Polish  String  Computer". 

II  “  PIPELINED  EXECUTION  IN  PASC-HLL 

The  PASC-HLL  order  oode  ia  eyntacticaly  in  Polish 
notation,  but  execution  la  nut  made  using  e  Push¬ 
down  Stack,  but  a  FIFO  evaluation  queue.  That  struc¬ 
ture  makes  appear  that  desynchronisation  between 
instruction  fetch,  acoees  to  Variable  Deoarlptora 
and  exaoution  of  operators  can  be  aehiaved.  Tha 
first  station  in  pipeline,  so-called  processor  PINS, 
fetches  the  next  instruction  from  Main  Memory  i  it 
executes  tha  Control  Functions  {loop  control,  condi¬ 
tional  branch,  procedure  oell  and  return  ...),  and 
sends  Accaas  Instructions  to  an  Access  station,  so- 
called  processor  FAC,  end  Operators  to  an  Oparating 
Station,  so-called  processor  FOP.  Internal  instruc¬ 
tion*  ssnt  by  PINS  to  FAC  or  POP  go  through  FIFO 
instruction  queues.  Variable  Descriptors  fetched 
by  PAC  are  sent  inside  the  FIFO  evaluation  queue 
managed  by  another  processor,  so-called  Local 
storage  Processor  LSP. 

So,  several  PASC-HLL  Instructions  are  concurrently 
In  different  steps  of  processing,  either  in  Fetch, 
or  Access,  or  Operation.  Moreovar,  procassor  LSP 
manages  e  FIFO  Dependency  Queue,  which  allows  to 
solve  the  delicate  problem  of  accessing  a  Value  which 
is  not  yet  modified,  but  is  known  to  be  modified 
just  later.  That  qusua  is  made  up  of  sixteen  12-bit 
word  Content  Addressable  Memory  that  holde  the  In¬ 
ternal  Naans  of  the  variables  which  are  to  be  modi¬ 
fied  or  have  just  been  assigned:  it  holds  the  Working 
Set  of  the  program,  achieving  good  performance  for 
data  aacese  and  reducing  the  amount  of  Memory  Acces¬ 
ses.  If  the  Working  Set  is  less  than  16  variablas 
(or  parameters),  no  memory  access  la  needed  except 
for  assignments :  all  the  Variable  Descriptors  are 
inside  the  CAM.  Hardware  design  of  the  PASC-HLL 
computer  is  now  completed s  FIVE  independent  proces¬ 
sors,  realised  using  bipolar  4-bit  siloes,  are  con¬ 
trolled  by  PIVE  microprograms  (ths  total  sirs  is 
32  Kbits),  with  a  cyole  time  e^ual  to  ISO  nanoseconds. 

Ths  PASC-HLL  computer  is  designed  to  be  inserted  in 
a  large  scale  computing  center,  as  a  specialised 
CPU  connected  to  a  "Boat"  computer  Main  Memory. 

The  Metso ry  Interface  Processor  inside  PASC-HLL  trans¬ 
lates  virtual  addresses  sent  by  the  PASC-HLL  Internal 
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stations  (PINS,  PAC  and  POP)  into  absolute  memory 
addresses  according  to  the  Memory  allocation  made 
by  the  "host"  system. 

The  PASC-HLL  computer  structure  is  given  by  Fig. 5. 

ill  -  THE  DEPENDENCY  PROBLEM 

Suppose,  for  example,  that  the  high-level  instruc- 
tion  “X  *■  <axpreesion>"  has  been  prepared  in  both 
PAC  and  POP  instruction  queues,  or  is  in  the  process 
of  execution  by  proceteor  POP,  whan  an  Instruction 
of  access  to  tha  variable  X  enters  tha  access  pro¬ 
cessor  PAC.  That  processor  (PAC)  must  b*  able  to 
dstset  ths  fact  that  both  "X  +•  <axpr*esion>  “  and 
"Access  X"  instructions  refer  to  the  bum  variable 
X,  since  the  "Access  to  X"  instruction  might  have 
to  be  differred  until  the  completion  of  the 
MX-*-<expresslon>"  instruction,  otherwise  the  "Access 
X"  would  not  fetch  tha  correct  value  of  that  varia¬ 
ble,  but  ite  old  value. 

The  proposed  t elution  uses  a  Content  Addressable 
Memory,  organized  in  a  FI-FO  mode,  which  holds  the 
names  of  the  variables  whose  modification  is  expec¬ 
ted  (access  to  their  value  must  be  deferred) ,  and 
tha  names  of  the  variables  which  have  just  been  mo¬ 
dified. 

When  an  instruction  "X-e<*xpr*sslon> "  is  fetched  by 
tha  access  processor  PAC,  a  Dependency  Descriptor 
is  pushed  into  ths  Dependency  Queue.  That  Descrip¬ 
tor  holds  ths  internal  name  of  tha  variable  X  (l.e. 
a  Lexical  Level  and  an  offset) .  From  that  time , 
further  references  to  variables  are  procaesed  through 
the  Content  Addressable  Dependency  Queue,  and  tha 
name  of  tha  variable  X  is  known  to  "match"  with  a 
Dpandency  Descriptor.  In  the  asms  time,  processor  v 
POP  might  have  completed  the  execution  of  tha 
"X«-< express lon>“  instruction.  So  processor  PAC  can 
find  either  the  new  value  of  the  variable  (just  sto¬ 
red  by  processor  POP),  or  a  Dependency  Descriptor. 

The  second  case  ia  processed  as  the  creation  of  an 
Defarred  Descriptor.  As  several  deferred  accesses 
to  the  same  variable  may  occur,  all  the  Deferred 
Access  Descriptors  related  to  that  variable  are 
linked  together,  and  they  ere  replaced  by  tha  new 
value  of  the  variable  on  the  completion  of  the  ex¬ 
pected  assignment  Instruction. 

In  the  example  illustratedby  Fig. 6,  processor  POP 
has  just  completed  the  modification  of  variable  C, 
and  it  Is  currently  evaluating  tha  expression  to  be 
assigned  to  variable  D.  In  the  same  time,  processor 
PAC  has  created  a  Dependency  Descriptor  for  variable 
D  and  it  has  fetched  three  "Access  D"  instructions 
which  has  been  processed  as  three  Deferred  Access 
Descriptors,  sines  the  new  value  of  d  is  not  yet 
evaluated.  After  Completion  of  the  modification  of 
variable  D,  the  state  of  the  queue  will  be  as  Illus¬ 
trated  by  Fig. 7. 

Using  the  above  mechanism,  an  "Access  x"  instruction 
can  be  deferred  until  the  completion  of  the 
"X-*-<expresslon>"  instruction  (assignment).  A  Defer¬ 
red  Access  Descriptor  is  pushed  into  the  evaluation 
queue.  All  the  Deferred  Access  Descriptors  are  lin¬ 
ked  together,  eliminating  a  number  of  memory  refe¬ 
rences  equal  to  the  number  of  linked  Descriptors. 
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IV  -  THE  CONDITIONAL  BRANCH  PROBLEM 

Both  PAC  and  POP  processors  say  bs  com  Ids  rad  as 
SLAVES  of  the  PINS  processor,  in  that  they  only  exe¬ 
cute  the  internal  Instructions  that  they  receive 
from  the  PINS  processor,  which  is  thus  considered 
as  the  MASTER  of  the  control.  However,  when  a  con¬ 
ditional  branch  occurs,  the  PINS  processor  is  not 
able  to  choose  the  right  next  instruction,  since  the 
conditional  expression  is  being  evaluated  at  the 
same  time,  but  it  can  choose  one  instruction  among 
all  the  possible  next  instructions  (generally  two) . 

The  probability  for  a  WRONG  choice  strongly  depends 
on  the  context  in  which  the  conditionel  branch  occurs i 
it  is  much  lower  for  the  end  of  a  loop  than  for  a 
classical  ir-TKEH-ELSE  statement.  Given  that  the 
different  high-level  conditional  statements  are  dis¬ 
tinguished  by  different  PASC-HLL  instructions,  the 
PINS  processor  knows  the  context  and  can  choose  the 
more  probable  next  instruction.  When  a  choice  has 
been  made,  we  say  that  the  PINS  processor  enters  a 
Conditional  State,  characterized  by  the  fact  that 
its  activity  is  limited  to  a  "preparation"  work,  in 
particular,  if  a  conditional  branch  is  fetched  again 
during  this  conditional  state,  no  choice  is  made, 
but  the  PINS  processor  stops  ans  waits  for  the  reso¬ 
lution  of  the  first  conditional  branch. 

When  the  value  of  the  conditional  expression  becomes 
available  in  the  POP  processor,  the  PINS  processor 
knows  whether  its  choice  was  wrong  or  not. 

In  the  case  when  the  choice  was  right,  all  processors 
can  go  on  without  any  modification.  In  the  other 
case,  all  the  prepared  work  Mat  be  disabled  i  this 
is. achieved  by  emptying  the  input  instruction  queues 
of  both  PAC  and  POP  processors  which  hold  wrong  ins¬ 
tructions,  and  by  updating  both  evaluation  and  de¬ 
pendency  queues  in  which  the  sequences  of  wrong  ope¬ 
rands  or  wrong  deferred  variables  must  be  delated: 
this  work  is  achieved  by  processor  LSP. 

V  -  HOW  TO  SAVE  TEE  EVALUATION  CONTEXT 

The  evaluation  context,  represented  by  the  interme¬ 
diate  state  of  the  evaluation  queue,  must  be  saved 
when  a  "function  call"  occurs  within  an  expression. 
When  the  "CALL  instruction"  is  fetched  by  the  PINS 
processor,  a  special  order  is  sent  to  the  POP  ins¬ 
truction  queue.  Since  several  function  calls  can  be 
nested,  a  SAVE  area  is  allocated  on  the  top  of  a 
push  -  down  stack  managed  by  POP.  Then,  processor 
PINS  which  knows  the  current  state  of  the  evaluation 
queue,  generates  a  sequence  of  orders  towards  the 
POP  instruction  queue.  Thus,  the  current  state  of 
the  evaluation  queue  is  saved  by  both  POP  and  LSP 
processors  before  the  function  is  entered,  all  pre¬ 
vious  intenwdlate  results  being  compacted  into  the 
save  area. 

When  the  function  is  returned,  processor  POP  is  able 
to  restore  values,  and  the  evaluation  process  goes 
again. 

CONCLUSION 

This  paper  has  briefly  presented  both  machine  lan¬ 
guage  and  computer  structure  of  PASC-HLL  computer. 

It  could  be  necessary  to  mention  that  pipelined  exe¬ 
cution  of  Polish  String  was  already  described  in  [9] 
and  is  not  explained  again  here.  That  design  shows 


a  new  kind  of  machine-language,  very  compact  (up  to 
4  times  sore  coav>act  than  a  classical  machine-lan¬ 
guage)  and  very  near  the  High  Level  Language  bringing 
a  new  kind  of  software  reliability  during  execution. 
The  PASC-HLL  pipelined  architecture  is  potentially 
capable  of  high  performance,  since  five  microinstruc¬ 
tions  are  exsoutad  each  cycle. 

Its  global  performance  is  equal  to  the  number  of 
memory  accesses  whioh  are  independently  made  by 
three  independent  processors  inside  the  oomputer, 
allowing  an  optimum  use  of  the  Main  Memory.  More¬ 
over  there  is  a  strong  relation  between  HLL  program¬ 
ming  and  hardware  processing  which  works  with  the 
prograssMr-daflned  variables  in  the  programmar-de¬ 
fined  control  environment. 
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HT  value 

Hardware  Type 

HT  value 

Hardware  Type 

0 

8-bit  integer 

8 

Character 

1 

16-bit  " 

9 

Char.  String 

2 

32-bit  “ 

A 

32-bit  real 

3 

64-bit  “ 

B 

64-bit  * 

4 

8-bit  powersat 

c 

Boolean 

5 

16-bit  " 

D 

Bool.  String 

6 

32-bit  “ 

E 

Array 

7 

variable  length 
power aet 

V 

Record 

Table  1  I  the  PASC-HLL  Hwdwri  Types 


Tag  bits 


I 

P 

s 

consent 

0 

0 

• . 

value  in  SV  field 

0 

1 

01 

pointer  value  in  SV  field 

1 

0 

00 

indirect  value  in  C0NTBXT 

1 

0 

01 

"  *'  in  DYNAMIC 

1 

0 

10 

"  "  in  MAIM  CODE  (constant) 

1 

0 

11 

“  •'  in  EXT.  CODE  “ 

1 

1 

00 

indirect  pointer  value  in  CONTEXT 

1 

1 

01 

“  M  "  in  DYNAMIC 

Table  2  i  Segment  Addressing  modes 


Instruction 

meaning 

REF  (11  .d) 
INDDUlfd) 
ASSIGN (11 ,d) 
CALL (Ilf  d) 
CALTUl.d) 

PARAM  (SP  i  -n) 
CLRV (11 .d) 

SETV ( 11  id) 
INCV(ll,d) 
DECVUl.d) 

accees  a  Variable  Descriptor 
build  an  Indirect  Variable  Descriptor 
assign  a  value  to  a  Variable 
access  a  Procedure  Descriptor 
access  a  Function  Descriptor 
assign  a  value  to  a  Parameter 
clear  a  Variable  (11*0) 
set  a  Variable  (Bi*true) 
increment  a  variable  ( I « —1+ 1 ) 
decrement  a  Variable  (It«I-l) 

Table  3  !  the  PASC-HLL  access  instructions 
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Encoded  Internal  name 


Comment 


9  8  7  6  5  4  3  2 


0  0 


D 


0  1  0 
0  1  1 
1  0  0 
1  0  1 
1  1  0 
1  1  1  0| 
TTTTf 


d 

d 

d 

d 

_d 

n 

n 


D-0..255  for  Global  Variables 

d"0..63  for  local  Variables 

d«-64..-2  for  Parameters 

Special  WITH  addressing 
Special  SP  relative  addressing 


*3  ■  The  PASC-HI.L  Variable  Internal  Name  encoding  (10-bit) 


PASCAL  structure 

PASC-HLL  structure 

X  i-  T(I+1)-1  i 

RJSF(T),  REP  (I)  ,  INC ,  INDEX,  DEC,  ASSIGN(X)  | 

while  exp  do  stat  i 

LOOP({ ), exp, WHILE, stat, ENDLOOP  ;4 

for  I  i*  expl  to  exp2  do  stati 

oxpi , exp2 , FORUP ( 1)  ,  1 

ASSIGN (I) ,Btat,ROrUP>  » 

case  exp  of  0,1: stati i 

3,4: stat2 i 

else : stat3 

end; 

exp, CASE  (j ) , 

LIT (0) ,LIT(1) ,0F (f ) , Stati ,FO, 

IlIT<3)  ,LIT  (4)  ,OF  (f  )  ,  Stat2  ,FO , 

^stat3,FO:, 

if  exp  then  statement i 

exp,  IF  ( |)  ,  statement, FI  i4 

PROCNAME (expl ,exp2)  i 

CAI.L  (PROCNAME)  ,expl  ,PARAM(SP,-3)  , 

eltp2  ,PARAM  (SP  ,-4 )  , ENTER: 

SET  (3,1. .J) i 

LIT  (0) ,LIT(3) .ADDELEM, 

REF (I),  REF(J),  SUBSET,  UNION, 

ASSIGN (SET) i 

Fig. 4  -  Equivalence  between  PASCAL  and  PASC-HLL  structures 
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Jute  modified  variables 


Deferred  access  variables 
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Abstrac t  code.  The  MCODE  improvements  are  based  on 

our  analysis  of  5,000  Pascal  procedures 
with  over  160,000  lines  of  program  text. 

MCODE  is  a  high-level  language,  stack  The  next  section  gives  a  brief  EM-1 

machine  designed  to  support  strongly-typed,  description  which  is  followed  by  a  discus- 

Pascal-based  languages  with  a  variety  of  sion  of  the  MCODE  improvements.  Also,  the 

data  types.  The  instruction  set  is  con-  instructions  used  for  expressions  and  Modu- 

structed  for  efficiency  and  extensibility  la  statements  are  illustrated.  Finally, 

and  is  based  on  an  examination  of  common  some  comparisons  are  drawn  with  respect  to 

programming  language  operations.  The  ar-  other  current  architectures,  including  the 

chitecture  provides  programmed  control  over  the  PDP-11 [1]  and  VAX[2). 
both  operand  type  selection  and  address 

field  widths.  In  addition,  right  operand  2.  Backq round 

addressing  is  included  to  improve  the  size 

characteristics  of  MCODE  instructions  over  Tanenbaum  designed  the  EM-1 (10)  to  op- 

those  of  traditional  stack  machines.  The  tlmize  the  most  frequently  occurring  high- 

design  is  compared  for  efficiency  with  the  level  operations  in  programs  as  analyzed  by 

instruction  sets  of  the  EM-1,  Digital  himself,  Knuth[B),  Alexander  and  Wort- 
Equipment  PDP-11  and  VAX-11/780.  man(3],  and  Wortman[13].  The  most  effec¬ 

tive  innovations  in  the  EM-1  ar#  encoding 
CR  Categories!  4.12,  4.22,  4.9,  6.21  references  to  the  first  12  bytes  of  local 

procedure  storage  and  8  bytes  of  static 
Keywords  and  Phrases:  stack  machine,  com-  storage  as  single  opcodes,  array  element 

puter  architecture,  addressing  modes.  accessing,  and  "if"  statement  comparison 

and  branching.  The  hypothesis  is  that 
smaller  code  sizes  will  enhance  faster  pro- 
1.  Introduction  gram  execution  by  better  utilizing  the 

bandwidth  of  CPU  data  paths.  In  addition, 
With  the  growing  use  of  high-level  as  the  machine  gets  closer  to  the  source 

languages  for  systems  and  applications  pro-  language,  compilers  can  produce  more  effi- 

gramming,  computer  instruction  set  design  cient  code  and  can  eliminate  space- 

has  moved  from  bit  selection  of  internal  consuming  peephole  optimization  routines. 
CPU  data  paths  to  Instruction  sets  which  Another  Important  aspect  of  the  EM-1 

are  oriented  to  common  high-level  language  design  is  the  notion  of  giving  the  program- 

operations.  Tanenbaum [ 10 ]  discusses  a  mer  code  improvement  tools  which  are 

stack  machine (EM-1 )  designed  with  this  phi-  machine  independent.  In  Knuth's  Fortran 
losophy.  The  EM-1  is  Intended  to  directly  analysis[8],  he  strongly  suggested  that 

execute  the  code  produced  by  the  SAL  com-  program  execution  histories  be  automatical- 

piler.  SAL  is  a  typeless  systems  program-  ly  generated  for  each  job.  With 

mlng  language  similar  to  BCPL[9].  In  this  Tanenbaum1 s  machine  organization,  the  pro¬ 
paper,  we  have  extended  the  EM-1  to  provide  grammer  need  only  declare  the  most  fre- 

an  instruction  set  for  a  Pascal-based,  quently  used  variables  first  in  textual 

strong  1 y-typed ,  systems  programming  order  to  effect  a  performance  improvement, 

language,  Modula[12],  which  was  designed  by 

Wirth  and  implemented  by  Cook[6J.  our  3.  Extensions 

Moduia  machine  code,  MCODE,  not  only  pro¬ 
vides  extensible  type  operations  but  also  The  first  problem  that  we  found  in 

maintains  the  efficiency  of  the  EM-1.  The  trying  to  use  the  EM-1  design  was  its  lack 

EM-1  was  designed  based  on  an  analysis  of  of  a  variety  of  data  types.  Moduia  pro- 

300  procedures  comprising  10,000  lines  of  vides  the  user  with  character,  Boolean, 

-  long  and  ghort  integer,  and  floating  point 

The  author  is  partially  supported  by  U.  S.  operations.  When  the  EM-1  is  extended  to 

Army  contract  DAAG29-75-C-0024  and  National  encompass  these  operations,  the  255  opcode 

.Science  Foundation  grant  MCS-7903947.  limit  is  quickly  exceeded.  Our  solution 

was  to  introduce  modes  of  computation.  A 
mode  sets  the  CPU's  fetch  and  execute  mi- 
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croprogram  to  adapt  to  a  particular  data 
type  such  as  floating  or  Integer.  A  col¬ 
lection  of  8-blt  opcodes  is  provided  to  set 
the  CPU  mode.  Therefore,  a  single  "+"  op¬ 
code  suffices  for  all  addition  operations 
on  any  data  type.  The  setting  of  the  mode 
can  be  thought  of  as  the  replacement  of  the 
microcode  jump  table  for  a  subset  of  the 
opcodes . 

The  mode  approach  is  based  on  our  ob¬ 
servation  that  expressions  are  usually 
comprised  of  operands  of  the  same  type) 
thus,  we  expect  that  the  space  occupied  by 
any  extra  instructions  needed  to  set  the 
mode  will  be  offset  by  the  savings  in  op¬ 
code  apace.  Modes  also  provide  an  expan¬ 
sion  and  contraction  capability  for  machine 
families.  For  Instance,  all  floating  point 
operations  could  be  eliminated  to  build  mi¬ 
croprocessors  intended  for  traffic  control 
or  a  decimal  mode  could  be  added  for  com¬ 
mercial  applications.  For  many  environ¬ 
ments,  the  savings  in  microcode  space  could 
be  significant. 

Our  second  extension  was  to  provide 
direct  addressing  for  right  operands.  Ac¬ 
cording  to  all  of  the  analyses,  expressions 
tend  to  be  simple.  Tanenbaum  found,  for 
Instance,  that  31%  of  all  assignment  state¬ 
ments  had  a  single  term  for  a  right  hand 
side.  Consider  the  evaluation  of  "a+b"  on 
a  typical  stack  machine.  We  must  "push  a", 
"push  b",  "pop  b  and  add",  and  "replace  a 
with  result".  The  alternative  is  to  "push 
a",  "add  b“ ,  and  “replace  a  with  result". 
This  sequence  not  only  saves  an  instruction 
fetch  but  also  the  redundant  push  and  pop 
of  "b"  plus  the  instruction  space.  These 
savings  will  be  replicated  for  every  term 
in  any  expression  which  cen  be  evaluated 
from  left  to  right. 

Finally,  we  have  extended  Tanenbaum's 
single  byte  addressing  modes,  provided  an 
option  to  shorten  address  fields,  Improved 
subscripting,  record  and  pointer  referenc¬ 
ing,  and  Introduced  some  additional  high- 
level  language  oriented  constructs.  In  the 
next  section,  we  will  discuss  operand  ad¬ 
dressing  . 


2**32  bytes.  The  instruction  formats  are 
designed  so  that  the  most  frequently  occur¬ 
ring  operations  require  a  minimum  of  in¬ 
struction  space. 

A  format  1  instruction  can  address  the 
first  8  16-bit  words  of  the  current 
procedure's  activation  record.  The  impact 
of  this  convention  can  be  seen  by  noting 
that  our  results  indicate  that  97%  of  all 
procedures  have  fewer  than  4  formal  parame¬ 
ters  and  90%  of  all  procedures  have  fewer 
than  4  local  variables.  Tanenbaum's  short 
address  convention  for  static  variables  was 
eliminated  since  the  size  of  the  static  ad¬ 
dress  space  is  not  known  until  load  time. 
However,  the  number  of  parameters  and  local 
variables  is  known  at  compile  time.  In  ad¬ 
dition,  our  analysis  shows  that  54%  of  all 
variable  references  were  to  local  variables 
or  parameters.  To  test  the  effect  of  this 
idea,  we  changed  all  the  local  variables  in 
the  Module  compiler  to  C(7]  ’register" 
variables  which  decreased  each  instruction 
reference  by  16  bits.  The  compiler's  size 
decreased  by  10%  and  ltB  compile  rate  went 
up  sevaral  hundred  lines  per  minute. 

The  format  2  and  3  instructions  can 
have  their  operands  on  the  stack  or  can 
have  a  right  operand  specification. 
Operand  addressing  is  optimized  in  a 
fashion  similar  to  that  provided  by  the 
B1700  [1 1 1 .  The  AMODE  instruction  sets  the 
address  field  width  to  8,  16,  or  32  bits 
for  references  to  either  static  or  local 
storage.  Note  that  program  counter  rela¬ 
tive  addressing  is  not  affected.  More  than 
90%  of  all  Modula  programs  can  use  an  AMODE 
which  selects  8-bit  local  and  16-bit  static 
addresses. 

As  an  example,  the  8-bit  AMODE  setting 
would  save  8  bits  per  operand  reference 
over  the  16-blt  addresses  used  in  the  PDP- 
11,  The  AMODE  setting  has  no  effect  on  in¬ 
direct  addressing  on  the  stack.  The  VAX 
implements  8-bit  address  fields  but  an  8- 
bit  selector  is  also  required  for  a  total 
of  16  bits. 

A  natural  concern,  however,  is  keeping 
AMODE  set  correctly.  Since  Module  has  no 
"go  to",  the  AMODE  bookkeeping  is  easily 


4.  Operand  Addressing  maintained  on  the  parse  stack.  Also,  the 

'procedure  call  instructions  automatically 
The  three  MCQDE  instruction  formats  save  and  restore  mode  information.  In  ad- 
are  illustrated  below;  dition,  the  linkace  editor  is  responsible 

for  checking  address  field  overflow  if  too 

FORMAT  1;  small  an  AMODE  is  being  used.  MCODE  imple- 

FORM  2,3,3  0, opcode, local  address  ments  the  following  addressing  forms; 

FORMAT  2;  A  operands  on  the  stack 

FORM  8  opcode  [operands)  B  {static  I  local ) xtdirectl indirect) 

C  local  direct 

FORMAT  3:  D  indirect  address  on  the  stack 

FORM  8,8  255, opcode  [operands)  E  32-bit  absolute  address 

F  constants,  16,  32  bits) 

In  MCODE,  addressing  is  partitioned  G  constant(0-l 5) 

into  references  to  either  static  or  local  H  { subsc r ipt I  element)  x  B 

procedure  storage.  The  MCODE  machine  uses  subscr ipt- ( (spt) -1 ) ‘Mode  size  +  EA) 

byte  addressing  and  has  an  address  space  of  element  -((spt)+EA)  Effective  Addr. 
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local  x  (direct, indirect, ind irect  x 
(  sub  sc  r  i  pt ,  e  1  em  en  t. ) ) 

8 -bit  jump  offset 
16-bit  jump  offset 


•  Forms  B  and  H  cover  accesses  to  simple 
variables,  pointers,  one  dimensional  ar¬ 
rays,  and  record  elements  occurring  in 
static  and  local  storage,  or  as  parameters. 
Subscript  addressing  assumes  a  lower  bound 
of  one  which  is  the  most  common  case.  For 
direct  addressing,  different  lower  bounds 
can  be  subtracted  from  the  address  field  to 
produce  the  correct  subscript  calculation. 
Forms  F  and  G  are  used  for  immediate  ad¬ 
dressing  while  forms  G,  J  and  K  are  used 
for  program  counter  relative  jumps  and  ab¬ 
solute  addressing.  Forms  I  and  C  are  used 
with  the  format  1(8  bit)  instructions. 
Form  I  can  be  used  to  access  local  vari¬ 
ables,  "const"  simple  parameters, 
simple  parameters,  and  array  and 
parameters . 

Tanenbaum { 10)  recommends  that 
ences  to  global  procedure  variables 
plemented  by  a  microcode  search  of  the 
cedure  call  back-chains.  The  claim  is 


var 

record 

refar- 
be  im- 
pro- 
that 

this  method  eliminates  the  overhead  of 
maintaining  a  static  display.  Based  on  our 
experience  with  Implementations  of  Algol(5] 
and  Pascal[4],  a  single  reference  to  a  glo¬ 
bal  variable  uses  more  time  than  that  need¬ 
ed  to  update  the  display.  The  following 
code  sequence  is  typical. 

procedure  entry: 

CONTROLB  LOCK [SAVE ) «DI S  PLA Y [N  EST ) 

DI S  PLAY [NEST ]  -PB 

procedure  exit: 

DISPLAY (NEST] -CONTROLB LOCK (SAVE ] 

The  fir3t  ten  locations  in  static 
storage  are  used  for  the  DISPLAY.  Accord¬ 
ing  to  our  study,  85»  of  all  procedures 
were  not  nested;  11%  were  nested  one  level; 
and  4%  were  nested  2  or  more  levels.  Out 
of  the  5,000  procedures  that  we  examined, 
one  was  nested  to  4  levels.  Therefore,  a 
ma/imum  of  ten  nesting  levels  was  con¬ 
sidered  sufficient.  Next,  we  will  examine 
the  format  of  the  one  byte  instructions. 

5.  Local  Variable  References 

We  followed  Tanenbaum1 s  design  by  al¬ 
locating  64  opcodes  to  special  addressing. 
As  we  discussed  previously,  the  local  vari¬ 
able  address  space  was  set  at  8  16-bit 
words,  or  a  3-bit  address  field.  This  left 
1  bits  for  opcodes.  These  8  opcodes  were 
partitioned  as  follows: 


PUSH 

Form  I 

(spi)  -  (EA) 

POP 

Fo  rm  C 

(EA)  -  (spf) 

ADD 

Fo  rm  C 

!sp)  +-  (EA) 

SUB 

Form  C 

(sp)  -«  (EA) 

CMPB- 

Form  C,K 

if  (spt)« (EA)  then 

CMPBO  Form  C , K 


(pc)  +-  SE (K ) 
if  (spt)O(EA)  then 
(pc)  +•  SE  (K  ) 


The  PUSH  instruction  uses  two  opcodes 
for  direct  or  indirect  references  to  simple 
variables,  and  two  opcodes  for  indirect,  or 
"var",  references  to  arrays  and  records. 
The  number  of  addressing  modes  for  POP  was 
decreased  to  one  in  order  to  Increase  the 
number  of  opcodes.  In  addition,  we  found 
that  variable  loads  occur  in  a  2.7/1  ratio 
over  variable  assignments  which  Indicates 
that  POP  is  used  less  frequently  than  PUSH. 
The  last  four  opcodes  ware  assigned  based 
on  our  frequency  of  use  information.  Out 
of  all  operator  occurrences,  "  +  *  was  used 
21%  of  the  time,  was  used  9%,  was 
used  20%,  and  "<>"  was  used  10%  of  the 
time.  According  to  Tanenbaum,  the  dynamic 
frequency  of  these  operators  is  even 
higher.  In  conditional  expressions,  we 
found  that  "»"  made  up  33%  of  all  operators 
and  that  "<>"  was  used  17%  of  the  tinje. 
Since  Tanenbaum  found  that  "if",  "repeat",, 
and  "while"  had  a  dynamic  frequency  of  38%, 
the  comparisons  were  implemented  to  both 
test  and  jump.  Using  these  formats,  many 
subprograms  can  be  completely  coded  using 
only  8  bit  instructions. 

6.  Right  Operand  Addressing 

Because  of  the  number  of  opcodes  need¬ 
ed  for  right-operand  addressing,  we  res¬ 
tricted  the  operators  based  on  the  same 
frequency  analysis  which  was  used  to  select 
the  8-bit  instruction  set.  The  following 
table  lists  the  instructions  which  can  ad¬ 
dress  memory: 


PUSH 

POP 

PU5HA 

ADD 

ADDTO 

AND 

CLR 

CMPB- 

CMPBO 

DEC 

INC 

MUL 

SUB 

SUBFM 


The  selected  operators  make  up  80%  of  all 
operator  references  In  the  Pascal  programs 
that,  we  analyzed.  Address  modes  B  and  F 
were  chosen  since  35%  of  all  operand  refer¬ 
ences  were  to  simple  variables  and  36%  of 
all  operands  were  constants.  The  ADDTO  and 
SUBFM  instructions  correspond  to  Module 
statements . 

7 .  Array,  Record  and  Pointer  References 
Simple  record  references  are  treated 


Form 

A,  B ,  D, F , H ,  G 

(sp±j  »  (EA) 

Form 

A , B , D, H 

(EA)  -  (sp+) 

Form 

B,  E,H 

(spi)  -  EA 

Form 

a,b,f 

(sp)  -  ( sp)  +  (EA) 

Form 

B 

(EA)  -  (EA)+(spt) 

Form 

A ,  B  ,  F 

(sp)  -  (sp)  k  (EA) 

Form 

B 

(EA)  -  0 

Fo  rm 

A ,  B  ,  F 

if  (sp*)«(EA) 

Fo  rm 

A,B,F 

if  (spt)O(EA) 

Form 

B 

(EA)  -  (EA) -1 

Form 

B 

(EA)  -  (EA)+1 

Form 

A,  B ,  F 

(sp)  -  (sp)  *  (EA) 

Form 

A ,  B  ,  F 

(sp)  -  (sp)  -  (EA) 

Form 

B 

(EA)  -  (EA)-(Spf) 

1  n 


-i! 

He. 
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just  like  simple  variable  references  and 
can  be  accessed  using  direct  addressing. 
However,  arrays  of  records  or  records  as 
parameters  must  be  accessed  by  an  offset 
from  a  base  address.  The  "element11  address 
mode  implements  the  pointer  or  parameter 
case . 

Our  analysis  showed  that  20%  of  all 
array  references  had  a  single  constant  sub¬ 
script  and  that  60%  of  all  subscripts  were 
a  single  variable.  The  constant  subscript 
case  resolves  to  a  variable  address  so  the 
standard  address  formats  can  be  used  to  ac¬ 
cess  the  array.  The  "subscript"  mode  was 
Introduced  to  Implement  accesses  to  one  di¬ 
mensional  arrays.  In  fact,  we  found  that 
references  to  multidimensional  arrays  made 
up  only  10%  of  all  array  references.  MCODF. 
uses  descriptors  to  Implement  the  multidi¬ 
mensional  case. 

In  the  EM-1,  every  array  has  an  array 
descriptor  cell,  an  array  descriptor  packet 
and  an  array  data  area.  This  approach 
works  fine  for  Algol  but  not  for  Pascal- 
like  languages.  First  in  Pascal,  all  ar¬ 
rays  have  static  bounds  so  a  single 
descriptor  can  be  generated  in  static 
storage.  This  approach  allows  descriptors 
to  be  shared  and  saves  stack  space  as  well 
as  setup  time.  Secondly,  Pascal  allows  ar¬ 
rays  of  arrays  and  pointers  to  arrays  which 
implies  that  the  base  address  of  an  array 
may  already  be  on  the  stack  and  not  in  a 
descriptor.  The  MCODE  SUBS  instruction 
transforms  the  subscripts  into  a  single 
byte  offset  which  can  then  be  used  by  the 
PUSH  or  POP  instructions.  The  SUBS  in¬ 
struction  also  checks  each  subscript  for 
val  idity. 


ABSolute 

ARith.  Shift 

CONVert 

DECrement 

Divide 

DUPl icate 

INCrement 


LoGical  Shift 
MOD 

NEGate 

NOT 

OR 

SQuaRe 

XOR 


MCODE  also  Includes  Instructions  for  moving 
and  comparing  blocks  of  storage  as  well  as 
library  call  instructions  to  implement  the 
Modula  virtual  machine  environment  and  the 
floating-point  math  routines.  In  the  next 
section,  the  code  generated  for  the  "case", 
"if"  and  "for"  statements  will  be  dis¬ 
cussed  . 


9.  Statements 


Procedure  call  and  return  are  very 
similar  to  the  EM-1,  except  for  the  display 
updating,  and  will  not  be  described.  The 
"if"  statement  is  implemented  with  the  fol¬ 
lowing  instructions: 

CoMPare  .><>=<«<> 

CoMPare  Branch  «><>=<»<> 
Branch  «0  <>0 

Branch 


As  an  example,  the  statement 

"it  a<>b  then  lnc(a)  and"  would  generate 
the  following  code: 


Instructions 

Size 

PDP-11 

Size 

PUSH  a 

~ 0 

CW 

a  ,b 

CMPB-  b  LI 

24 

JEQ 

LI 

16 

INC  a 

16 

INC 

a 

32 

SUBS 


descriptor  address 


I* 


W 


The  Instruction  address  points  to  an 
array  descriptor  which  contains  the  number 
of  bounds,  bounds  pairs,  multipliers,  ele¬ 
ment  size  and  virtual  origin.  SUBS  leaves 
the  element  index  on  the  stack.  For  in¬ 
stance,  "A[I].B[J]"  would  produce  the  fol¬ 
lowing  code. 


PUSH 

1 

SUBS 

A  desc 

PUSHA 

elemen 

PUSH 

a 

SUBS 

B  desc 

ADD 

For  most  Modula  programs,  each  array  type 
can  be  described  by  a  single  instance  of  a 
descriptor  no  matter  how  many  variables  of 
that  tyoe  are  created.  Next,  the  expres¬ 
sion  operators  will  be  described. 

8 .  Operators 

The  following  table  lists  the  MCODE 
operators  which  are  all  format  2  Instruc¬ 
tions  . 


The  syntax  and  code  generated  for  the  "for" 


statement  are  listed  bel 

ow 

for  v : *el  by  e2 

83  do  S  end 

PUSKA 

V 

PUSH 

e  1 

PUSH 

•  2 

PUSH 

e  3 

FOR 

L2 

LI  S 

ENDFOR 

LI 

L  2 

The  "case"  instructions 

ar 

e  as  follows 

CASE  constant,  offset 

CASE  constant,  constant,  offset 

CASETBL  constant,  constant 

These  three  forms  cover  the  situations  in 
which  the  "case"  is  distinguished  by  a  sin¬ 
gle  value,  a  range  of  valueB,  or  a  jump 
table.  Next,  we  will  analyze  the  effec¬ 

tiveness  of  MCODE  with  respect  to  other 
machine  designs. 
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10 .  Comparison  with  Other  Machines 

The  results  in  Figure  1  extend  the 
table  In  Tanenbaiae  ( 10]  to  include  the  VAX 
and  MCODE.  Obviously,  the  special  address¬ 
ing'  and  descriptor-based  array  computations 
make  a  significant  difference.  MCODE  per¬ 
forms  better  than  the  EM-1  for  expressions 
and  parameter  referencing  and  is  as  good  in 
all  other  areas.  Hie  difference  in  the 
"if*  tests  occurs  because  the  EM-1  assumes 
a  2-bit  field  for  branch  offsets  while  we 
used  an  8  bit  field.  The  VAX  Instructions 
are  computed  using  8  bit  displacement  ad¬ 
dressing.  In  addition,  it  should  be  point¬ 
ed  out  that  the  VAX  and  MCODE  are  support¬ 
ing  many  more  data  types  than  the  PDP-11  or 
the  EM-1.  Figure  2  recomputes  the  space 
for  the  same  statements  but  with  all  the 
machines  forced  to  use  IS  bit  addressing. 

The  values  in  Figure  2  give  a  lower 
bound  on  the  performance  of  MCODE  whereas 
Figure  1  gives  an  upper  bound  on  the 
difference.  For  16-blt  addressing,  which 
would  be  used  for  references  to  static 
storage,  MCODE  is  better  in  all  categories. 
The  EM-1  is  forced  to  use  a  16-blt  opcode 
to  access  16-bit  addresses  which  results  in 
its  poor  performance.  Since  47%  of  all 
variable  references  are  to  static  storage, 
we  feel  that  this  improvement  could  have  a 
significant  impact  on  execution  speed.  The 
VAX  is  still  quite  poor  with  respect  to 
subscripting  even  though  a  special  instruc¬ 
tion  is  available  for  that  purpose.  Also, 
the  figures  do  not  reflect  the  dynamic  ef¬ 
fect  of  the  savings  since  Tanenbaum's  meas¬ 
urements  indicate  that  the  Figure  1  results 
are  even  more  significant  at  runtime. 

1 1 •  Conclusions 

We  feel  that  I  t  availability  of  modes 
as  an  extension  a  nanism  for  high-level 
language  machines  can  oe  a  significant  fac¬ 
tor  in  adapting  microprocessors  to  changing 
environments.  Also,  modes  contribute  to 
space  efficiency  in  the  instruction  set. 
The  use  of  address  mode  settings  to  reduce 
address  field  sizes  and  right  operand  ad¬ 
dressing  also  contribute  space  savings. 
The  current  version  of  Modula  produces 
PDP-11  or  VAX  code  so  we  have  the  means  to 
compare  the  exact  statistics  on  the  static 
and  dynamic  batiavior  of  MCODE  with  these 
machines  using  the  same  programs  in  the 
same  environment.  Our  analysis  should  con¬ 
tribute  to  the  alternatives  available  for 
opcode  design  In  modern  machine  families. 
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figure  1 

Direct  Addressing  Instruction  Size(in  bits) 


Statements 

MCODE 

EM-1 

PDP-11 

VAX 

i  i-0 

16 

S 

32 

24 

i  i-3 

16 

24 

48 

32 

i:-j 

16 

16 

48 

40 

i i*i+l 

16 

8 

32 

24 

i  : *i+  j 

24 

32 

48 

40 

i  :  -  j+k 

24 

32 

96 

56 

1  !  “  j+1 

24 

24 

80 

48 

ii»a[ j] 

24 

32 

128 

104 

a  (  i] i "0 

32 

32 

112 

88 

a  [  i)  :-b[  j] 

40 

48 

192 

168 

a[  i)  «-b[  j]+c[k] 

64 

80 

304 

248 

at  i.j  ,k]  :-0 
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Figure  2 

16-Blt  Address  Fields 
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The  instruction  set  for  the  SYMBOL  computer 
system  is  discussed  in  detail.  The  SYMBOL  computer 
is  u  large  scale  multiprocessor  which  implements  a 
high  level  language,  compiler,  text  editor  and  time- 
shared  operating  system  entirely  in  hardware.  The 
intent  of  the  paper  is  to  document  the  instruction  set. 
us  used  in  the  working  system  for  over  seven  years. 

Covered  are  the  internal  codes,  what  they  do.  and  the 
associated  machine  maintained  data  structures 

Introduction 

The  SYMBOL  computer  system1-2-'  is  of  great  importance  in 
the  field  of  computer  architecture  since  it  represents  a  major  departure 
from  von  Neumunn  architectures  and  is  une  of  the  few  examples  of  an 
experimental  (or  commercial)  machine  that  resulted  in  a  lull  scale- 
working  High  Level  Language  Computer  System.  Although  the  high 
level  SYMBOL  Programming  Inwguugc  (SPL)  was  implemented  in  the 
machine  without  the  aid  of  software,  SYMBOL  does  have  an  internal 
instruction  set  much  like  any  computer.  Unlike  most  computers,  how¬ 
ever,  the  SYMBOL  Instruction  Set  is  non-von  Neumann  and  at  a  very 
high  level,  with  almost  a  one-to-one  mapping  Ix-lwccn  tokens  in  the 
source  code  und  instructions  in  (he  object  code.  Though  the  instruc¬ 
tion  set  is  probably  the  best  wuy  to  describe  the  computational  abilities 
of  SYMBOL,  it  has  been  one  of  the  least  documented  features.  This 
paper  seeks  to  fill  this  gap  by  providing  a  detailed  description  ol  the 
instructions,  how  they  are  executed,  and  the  internal  data  structures 
used  in  executing  SYMBOL  object  programs. 

SYMBOL  Architecture  Overview 

Because  of  SYMBOL’S  unusual  machine  architecture,  a  brief 
description  of  the  system  is  in  order.  The  SYMBOL  computer  system 
is  composed  of  eight  relatively  autonomous  processors:  the  System 
Supervisor,  the  Input/Output  Processor,  the  Channel  Controller,  the 
Drum  Controller,  the  Memory  Reclaimer,  the  Memory  Controller .  the 
Translator,  und  the  Central  Processor  The  Iasi  three  of  these  arc  of 
special  interest  for  the  purposes  of  this  paper. 

Program  execution  is  controlled  by  the  Central  Processor,  which 
is  itself  composed  of  four  sub-processors.  The  Instruction  Sequencer  is 
responsible  for  fetching  instructions,  executing  some  directly  and 
delegating  the  rest  to  another  sub-processor.  The  Arithmetic  Proces¬ 
sor4  performs  traditional  arithmetic  operations  wi'h  precision  con¬ 
trolled,  decimal  arithmetic.  The  Format  Processor'  handles  character 
oriented  operations,  as  well  as  the  packing  and  unpacking  of  numbers 
Lastly,  the  Reference  Processor  controls  all  identifier  referencing 

One  of  the  more  unusual  aspects  of  SYMBOL  is  that  the 
memory  structure  is  not  organized  as  a  contiguous  set  of  sequentially 
numbered  storage  cells.  Instead,  storage  is  viewed  by  most  of  the  pro¬ 
cessors  as  a  limitless  suppl  of  variable  length  storage  strings,  whose 
storage  cells  (machine  words)  arc  logically  sequential  but  may  not  be 
consecutively  addressed  In  memory.  The  SYMBOl.  memory  structure 
consists  of  four  hierarchal  levels.  At  the  lowest  level  a  core  memory 
and  a  rotating  magnetic  dtum  constitute  the  physical  memory.  Next  is 
a  paged  virtual  storage  system  consisting  ol  2“''  M-hit  words,  ol  which 
4(N6  2K-bytc  pages  were  implemented  1110  Memory  Controller,  with 
a  set  of  high  level  memory  operations,  maps  virtual  storage  into  "logi¬ 
cal  storage.'*-7  At  the  highest  machine  level  arc  the  o|x-ialions  which 
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operate  on  the  user  data  structures.  Throughout  ilic-  rest  ol  litis  papc-i 
the  word  "string"  referring  to  storage  means  a  separate  and  logically 
sequential  series  of  words,  used  in  the  same  mannci  as  segments  in 
other  computer  systems. 

Tht  SYMBOL  Programming  lamguagt 

Because  the  instruction  set  of  the  SYMBOl.  machine  is  so 
directly  tied  to  the  language  it  implements,  the  reader  will  find  the  fol¬ 
lowing  sections  easier  to  understand  by  referring  to  one  of  the  many 
descriptions  of  the  language.5 .a.v.io.li  Basically.  SPL  is  a  general- 
purpose  procedural  block-structured  language.  In  many  ways,  it  can 
he  viewed  us  a  mixture  of  APL,  ALGOL  and  LISP.  The  language  is 
free  of  most  declarations  us  to  the  size  or  type  of  data  objects:  these 
attributes  can  vary  dynamically  during  the  life  of  a  program.  Data 
objects  arc-  either  scalars  (i.e.  a  sequence  of  characters  thal  may  hap¬ 
pen  to  fit  the  definition  of  a  number.  Boolean,  or  string),  or  the  ohjec-l 
is  a  structure  whose  elements  urc  cither  sculurs  or  other  structures. 
Structures  may  be  of  any  arbitrary  shape  which  Is  representable  by  a 
tree,  and  may  not  he  recursively  defined.  Procedure-  pass  parameters 
via  eall-hy-namc.  also  known  as  cull  by  substitution.  There  arc  no 
automatic  variables,  all  variables  ure  statically  allocated.  Scoping  rules 
arc  such  that  a  variable  is  known  only  lucully,  unless  explicitly  declared 
to  lx-  global.  SPL  also  has  ON  blocks,  similar  in  may  ways  to  ON 
blocks  in  PLI. 

Instruction  Set  Overview 


! 

i 


SYMBOL,  instructions  are  ordered  in  reverse  Polish  notation  and 
make  use  of  an  expression  evaluation  slack.  SYMBOL  uses  both 
descriptors  and  tags  for  recording  the  attributes  and  structure  of  data. 
Descriptors  arc  grouped  together  in  Name  Tables,  generated  by  the 
Translator  at  compile  time.  Type  tags  arc  associated  with  the  datu.  at 
the  beginning  of  a  data  object  the  tag  records  the  type;  a  tag  is  also 
used  to  denote  the  end  of  a  data  object.  The  basic  instruction  set  is 
shown  in  Table  1,  with  the  internal  bit  representation  shown  in  hexa¬ 
decimal.  Throughout  this  paper  internal  codings  will  he  shown  in  hex¬ 
adecimal  unless  otherwise  indicated.  Addresses  in  SYMBOl.  arc  24 
hits  long  and  address  sixty-four  bit  words  For  haidwate  bussing  sim¬ 
plicity.  each  word  contains  a  maximum  of  two  instructions,  each  half¬ 
word  instruction  consisting  of  un  eight  hit  o|K<xlv  lollowcd  by  a 
twenty-four  bit  address  field.  Only  six  ol  the  opcodes  require  an 
address. 


"I 


Internal  Representation  or  Bata  Values 

The  storage  format  for  scalar  character  siring  values  consists  ol  u 
String  Start  character  (F5),  followed  by  the  characters  in  the  string  in 
ASCII,  followed  by  a  String  End  character  (F6);  this  is  culled  the  data 
string  format.  Scalur  values  appear  in  the  object  string  as  a  tcsult  of 
literals  in  the  source  string.  A  scalar  value  may  be  stored  in  a  Name- 
Table  if  it  is  six  or  fewer  characters  in  length  (since  mere  must  lx- 
room  for  the  string,  start  und  end  characters  in  .in  eight  ch.iraclci 
word)  l-or  longer  sitings,  a  memory  string  is  ulloc.iicd.  and  a  punnet 
to  this  string  is  placed  lit  the  Name  Table  lluwcvci.  il  the  stung 
should  later  shrink  to  six  or  lewer  characters,  it  would  ictitain  in  the 
memory  string,  and  not  lx-  placed  in  the  Name  I  able 
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Table  1.  SYMBOL  Instruction  Set 
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A  second  storage  format  is  used  for  packed  decimal  numbers; 
the  numeric  field  format.  This  is  the  format  in  which  the  Arithmetic 
Processor  produces  its  results.  If  an  operand  for  an  arithmetic  opera¬ 
tion  is  in  data  string  format,  the  Format  Processor  will  automatically 
convert  the  operand  into  the  numeric  field  format  before  the  Arith¬ 
metic  Processor  proceeds  with  its  operation.  The  components  ol  a 
number  stored  in  numeric  field  formal  are  the  exponent  sign,  the 
mantissa  sign,  the  exponent  magnitude,  the  mantissa,  and  the  precision 
code  Ihe  exponent  ansi  mantissa  signs  appear  us  two  hits  of  the  stun 
character  (HI,  FI.  F2.  or  F.d).  The  character  following  the  start  char¬ 
acter  contains  the  exponent  magnitude  as  two  BCD  digits.  The  I  to 
digit  mantissa  is  stored  after  the  exponent  character  and  occupies  as 
many  words  its  are  required  at  ten  digits  per  word.  (The  first  two  and 
last  bytes  <4  a  word  arc  not  used  for  packed  BCD  data  so  that  mantissa 
digits  will  always  occupy  the  same  porlitsn  of  the  word.)  Following  the 
last  mantissa  digit  is  a  four  bit  precision  code  which  indicates  that  the 
number  represented  has  either  infinite  precision  (1111),  or  only  that 
precision  implied  by  the  number  of  mantiMa  digits  (111(1).  The 
representation  (or  a  true  numeric  zero  starts  with  an  F4.  The  last 
word  o(  the  numeric  string  is  indicated  by  a  set.  high  order  hit  in  the 
last  byte 


Structure  values  may  appear  on  the  s'.tck  or  in  the  object  string 
m  linear  format,  or  elsewhere  in  tree  format.12  In  tree  formal,  a  struc¬ 
ture  is  stored  in  a  memory  string  as  a  succession  o(  scalar  values  and 
links  to  substructures.  The  scalars  are  stored  with  Wart  and  end  char¬ 
acters  as  described  above,  and  are  sligned  on  word  boundaries.  If.  at 
a  later  time,  the  scalar  expands  and  requires  more  space,  then  addi¬ 
tional  (M  byte)  memory  groups  are  linked  (inserted)  into  the  memory 
string  A  link  lo  u  substructure  consists  of  a  single  word  beginning 
with  the  character  EC',  and  containing  an  address  pointing  to  a 
scpai ale  memory  string  whctc  the  substructure  begins.  Following  Ihc 
last  component  in  the  structure,  and  in  each  substructure,  is  the  End 
Vector  character  (F7).  A  null  vector,  the  analogue  of  the  null  string, 
is  stored  simply  as  a  memory  string  beginning  with  an  F7. 

In  the  linear  format,  the  structure  value  begins  with  s  Begin 
Structure  character  (FD).  and  ends  with  an  End  Structure  character 
(FF).  Between  these  two  characters  are  stoned  the  components  of  the 
first  level  of  the  structure.  Scalar  components  ate  stored  with  start  and 
end  characters,  aligned  on  word  boundaries  as  in  the  tree  format.  If 
the  component  is  a  substructure,  however,  it  Is  represented  by  i  a  Start 
Vector  character  (FC).  followed  by  the  components  of  the  substructure 
(which  may  be  scalars  or  structures),  followed  by  an  End  Vector 
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character  (F'E).  The  start  and  end  characters  FI'.  FI).  FI:,  and  IT  ul 
the  linear  format  are  the  only  essential  characters  in  the  woids  they 
Ixrgin,  so  seven  bytes  are  always  wasted  in  these  words. 

If  an  initial  value  statement  in  SPL.  is  preceded  by  the  key  wool 
SWITCH,  the  Translator  treats  the  initial  values  that  aic  being 
assigned  as  identifiers  for  labels.  The  Translator  stores  values  in  the 
object  string  as  it  would  for  an  ordinary  initial  value  statement,  using  a 
single  word  to  store  each  scalar  value  The  scalar  label  value  is  stored 
ns  n  word  beginning  with  the  character  IX)  (which  is  also  the  opcode 
for  the  Name  Table  Pointer  instruction),  followed  by  a  24-hil  address 
pointing  to  a  data  descriptor  for  the  label  in  a  Nume  Table,  followed 
with  the  character  F6.  Label  values  may  be  "moved  around"  like  othei 
values  (e  g.  assigned  to  variables,  passed  as  procedure  arguments, 
returned  as  function  values,  etc  ).  And  of  course,  a  label  value  may 
lie  the  operand  of  a  Go  To  instruction.  Label  values  cannot  appear  as 
operands  for  any  arithmetic,  string  or  Hoolcun  operations. 


The  Evaluation  Stack 

For  elicit  active  instance  of  a  block,  the  Central  Processor  main¬ 
tains  a  stack  to  be  used  for  evaluating  expressions,  procedure  culling, 
and  passing  information  to  other  processors.  (Whenever  possible,  the 
Central  Processor  keeps  the  top  word  of  the  stack  in  nn  internal  regis- 
ler )  Each  stack  is  a  unique  memory  siring  und  is  created  when  a  block 
is  entered,  and  deleted  when  the  block  is  exited. 

The  first  three  words  of  the  stuck  lire  u  save  area  for  the  block. 
When  the  slack  is  created,  a  pointer  to  the  block's  Nume  Table,  and  a 
pointer  to  the  calling  block's  stack  are  stored  in  the  first  word  of  the 
stack.  If  this  block  should  call  another  block  (explicitly  by  u  procedure 
call  or  implicitly  by  an  ON  reference),  ihcn  a  pointer  to  the  start  of 
the  culled  block's  object  string  and  the  contents  of  the  status  register 
ure  stored  in  the  second  word.  Alto,  the  return  point  und  top  of  slack 
pointer  ure  stored  in  the  third  word  of  the  Mack. 

The  remainder  of  the  stack  it  for  expression  evaluation,  If  an 
operand  is  being  pushed  on  the  stack  and  it  is  one  word  or  less  in 
length,  it  is  copied  directly  to  the  stack.  Otherwise,  it  is  left  in  place, 
and  a  (xiinter  (link)  word  is  stacked.  (The  two  exceptions  to  this  rule 
are  operands  for  the  Output  operation,  und  values  for  the  assignment 
operations.) 

Link  words  begin  witli  a  character  indicating  the  nature  of  the 
operand:  EU  (simple  variable  or  value).  E.l  (structure).  E.1  (label).  E4 
(scalar  value  in  Name  Table),  BJ  (memory  string  containing  sulv 
scri|ited  variable  reference),  E6  (scalar  value  that  won't  fit  in  one 
word),  E8  (scalar  or  structure  valued  component  of  u  structure),  EA 
(IN  reference  to  simple  variable),  EB  (IN  reference  to  structure),  or 
EE  (IN  reference  to  variable  with  value  in  Name  Table).  The  code 
generated  for  an  IN  expression  is  the  IN  instruction  (H8),  followed  by 
n  subscripted  variable  reference.  The  result  of  the  expression  is  » 
Boolean  value  indicating  whether  the  indicated  component  ol  the  vari¬ 
able  exists  The  left  address  field  of  the  link  word  is  used  when  point¬ 
ing  at  the  data  value  and  the  right  address  field  is  used  when  pointing 
to  the  descriptor  for  the  value. 

The  Colon  (BA),  Intcgerue  (DA),  and  Perform  Subscription 
(DD)  instructions  are  related  to  structure  references.  Expressions  for 
subscripts  are  evaluated  (using  the  stack  for  intermediate  values)  and 
are  then  converted  to  a  four  digit  Integer  Each  subscript  is  stored  on 
the  stack  in  a  word  beginning  with  BA  or  DA,  which  indicates  the 
type  of  subscript.  DD  (Perform  Subscription  operation)  follows  the 
last  subscript.  The  character  substring  operation  it  handled  as  a  torm 
of  subscription,  (For  example.  In  SPL  x|l.2:3|  is  the  3  character 
siring  from  the  first  component  ot  x,  beginning  at  character  position 
2.)  The  subscript  preceding  the  colon  is  stacked  in  a  word  beginning 
with  the  character  BA  (Cokm  operator)  All  the  remaining  subscripts 
arc  slacked  in  words  beginning  with  DA  llnlegeri/e  operation)  Alter 
the  Perform  Subscription  operator  has  been  stacked,  the  link  to  the 


variable  being  subscripted  and  the  suhsenpt  list  ate  moved  to  a  new 
memory  string  und  a  link  wool  (L-.5)  is  placed  on  the  stack  pointing  to 
the  siring.  The  subscripted  relerenee  is  not  evaluated  until  it  is  abso¬ 
lutely  necessary  to  have  the  value  or  location  indicated 

Name  Tables 

The  Translator  produces  a  Name  Table  lor  each  bhxk  (mam 
program,  procedure  or  ON  block)  in  the  source  string  All  teletenees 
made  in  that  block  to  labels,  procedures,  or  variables  are  made 
through  the  descriptors  in  that  block's  Name  I  able  Figures  I  shows 
the  organization  of  the  Nnme  Table  und  figure  2  slums  the  organiza¬ 
tion  of  the  control  words.  The  first  word  ol  a  Name  I  able  is  called 
the  Block  Control  Word  It  contains  two  address  fields  which  ate  used 
to  link  all  the  Name  Tallies  lot  a  program  The  lost  address  held  is 
used  to  forward  link  all  the  Name  Tables  togcihvi  in  a  single  list, 
beginning  with  the  Nume  Table  for  the  main  pmginm  The  second 
address  field  is  used  as  a  pointer  to  the  Name  Table  tot  the  slalieallv 
enclosing  block .  (T  he  Block  Control  Word  lor  the  Name  Table  ol  the 
main  program  which  is  the  outermost  block,  has  no  such  pointer  )  The 
actual  bit  definitions  are  shown  in  Table  2. 

The  Block  Control  Word  contain!  a  hit  indicating  bltxk  in  use, 
and  a  bit  indicating  block  rccuncd.  When  a  bltxk  is  entered,  the 
block-in-use' hit  is  set.  and  the  bit  is  cleared  when  the  block  is  lell  IT 
a  block  is  reentered  (t.e.  entered  when  the  block-ill-use  hit  is  ahead) 
set),  the  Central  Processor  calls  on  a  software  routine  to  pcrlnrtn  a 
"fixup"  in  handle  recursion.1'  (The  hardware  was  not  xpecilicully 
designed  to  bundle  recursion.)  This  software  must  create  a  copy  ol  the 
block's  Name  Table,  modify  the  original  Name  Table  In  initialize  heal 
variables,  and  then  set  the  Mock-recursed  bit  Similarly.  II  a  block  is 
left  with  the  bluck-rccursed  bit  set,  another  sol  I  ware  routine  is  called 
that  must  undo  the  work  of  the  first  routine. 

Following  the  Block  Control  Word  is  a  succession  of  entries  lot 
each  identifier  in  the  block.  Each  entry  consists  ol  the  ASCII  name  ol 
the  Identifier,  taking  us  many  eight-byte  words  us  necessary  und  pud¬ 
ding  with  nulls,  and  followed  by  a  one  word  dura  descriptor  for  the 
identifier  called  the  Identifier  Control  Word  (ILX'W).  Tublc  3  shows, 
the  bit  layout  of  the  IDCW. 

If  the  identifier  is  a  local  variable,  il  is  so  lugged  m  the  IIX'W. 
There  are  also  flag  bits  to  indicate  whether  the  variable  is  scalar 
valued,  structure  valued,  or  has  not  yet  been  assigned  a  value.  If  the 
value  is  undefined,  then  the  first  time  that  the  variable  is  accessed,  n  is 
given  a  null,  scalar  vulue,  which  is  inlerprelcd  us  u  zero  lor  arithmetic 
operations.  If  a  value  is  defined,  then  that  vulue  may  uppeur  in  the 
object  string,  the  Name  Table  (scalars  only),  or  elsewhere  in  memory. 

If  an  initial  value  statement  occurs  in  the  source  siring,  the 
Translator  will  place  a  pointer  to  the  object  string  location  ol  the  vulue 
in  the  IDCW  for  the  variable.  When  the  variable  is  first  accessed,  the 
value  is  copied  from  the  object  string.  (Note  that  lot  structures  this 
means  converting  from  lit,  ir  to  tree  formal.)  A  pointer  to  the  vulue 
replaces  the  old  pointer  in  the  IIX'W,  and  the  Nt  indicating  data  m 
object  string  is  cleared. 

For  data  in  Name  Tables,  recall  that  scalars  (excluding  lalxT 
values)  are  stored  with  a  start  character  which  begins  with  hex  "F“. 
This  hail  byte  at  the  beginning  of  an  IDCW  is  used  to  indicate  dulu  in 
Name  Table  (i.e.  the  IDCW  ii  the  scalar  value  itself).  If  the  value  is 
elsewhere  in  memory,  the  IDCW  contains  a  pointer  to  the  memory 
string  where  the  value  begins. 

Global  variables  are  variables  that  are  known  m  outer  Nocks. 
SYMBOL  permits  the  nesting  ol  blocks  as  does  PI /I.  In  contrast  to 
PI7I  however,  variables  are  local  unless  declared  global  in  un  SPL 
Global  statement.  The  IDCW  contains  a  hit  indicating  if  the  variuble 
is  global,  and  a  pointer  to  the  IDCW  for  the  vuriable  in  the  enclosing 
Mock.  There  may  in  general  be  many  levels  of  such  indirection  If 
the  identifier  is  for  a  prcxredurc.  the  global  variable  and  pnvedure  hits 
of  the  IIX'W  will  lie  set  Prixrcdurc*  arc  an  exception  to  the  local 
unless  declared  global  principle  A  proccdinc's  x o| v  always  extends 
to  inner  blocks.  The  IIX'W  will  also  contain  a  |xuiiici  to  the  Iix.iIiihi 
in  oh|ccl  stung  where  the  procedure  begins  In  SPI  .  a  lalx'l  is  always 
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Identifier  Cunlrol  Word  Format 

7 _ Meaning _ ! 

Coni  ml  word  (always  I  for  IDCW) 

Sliirt  ol  N.iinc  liiblc  ( nlnm 1 1 1  lor  IIX'W l 
I  nil  ol  Nome  I  able 

I  New.  Vi  lor  local  variables  anil  labels. 

|  anil  loi  I'loci'illius 

lililie ales  Siinable  with  mine  in  Nmne  I  able  when 
!  all  loin  biis  are  tel.  I  he  IIX'W  is  a  scalar  value 
j  (not  alloweil  lor  lust  IIX'W  in  Name  Cable) 

,  Cilobal  Set  loi  global  vuriiiNcs,  labels,  anil 
|  pinccilnres.  ami  lor  direct  formal  patameters 
j  Variable  with  Millie  in  object  siring 
;  Variable  is  structure  valued  (il  I ).  scalar  valued  id  bi 
'•  Mag  bits  1’-1‘t  mini 

l  or  mnables.  pointer  lo  value  (zero  it  spate  not 
vet  assigned).  I  or  label  or  pweedute.  pointer  to 
object  string  entry  point 
If  Nt  is  (l.  two  digit  current  pointer  index 
used  during  subscription 
It  bit  7  is  | .  then  hits  12-Vt  are: 

ON  blis  k  le.ilme  enabled 
|  Idvntilici  is  a  lalvl 
I  lilemilti'i  n  a  procedure 
lileniiliei  it  a  nidili'il  loritial  pammclct 
!  I  oi  nleiiblieis  mill  ON  Ivlocks.  pointer  lo  ( )N  blink 
|  code  I  or 'tinctures  without  ON  blocks,  current 
|  pomlvi  .ulilie tt 
■  I  'miteil 


local  lo  i lie  block  wlieie  n  ociun  However,  the  scope  of  the  label 
ina\  he  extended  to  an  inner  block  tl  that  block  contains  a  Global 
ilutcmcnt  naming  the  label.  I  bus  a  tin  To  can  he  used  lo  jump  out  of 
a  blmk.  Che  IIX'W  lot  a  label  contains  a  pointer  lo  the  location  in 
the  nhici'l  siting  w licit  execution  is  to  eotilinuc. 

l  or  laoceiliaes.  the  lust  entries  tn  the  Nairn'  Table  will  he  the 
loi nml  parameters  (it  anyl.  simply  Ivccaiiso  they  are  the  first  identifiers 
encountered  by  (he  Cranslatin  when  stunning  the  source  string.  SYM- 
BOI  implements  cull-by-name  for  all  parameters.  When  a  procedure 
is  called,  the  formal  parameters  are  linked.  Two  mechanisms  exist  in 
SYMBOI  for  linking  parameters.  Ilic  most  general  method  (indirect 
parameter)  is  lo  compile  code  near  the  calling  point  in  the  object  siring 
to  evaluate  the  actual  parameter  (commonly  known  as  a  thunk). 
When  the  procedure  is  called,  a  pointer  to  the  code  for  the  actual 
parameter  is  placed  into  the  IIX'W  tor  the  formal  parameter.  When¬ 
ever  the  formal  parameter  is  referenced,  this  code  is  executed  and  the 
actual  parameter  is  left  on  the  top  of  the  slack.  (Part  of  the  fixup  for 
recursion  requires  that  a  mtxiified  copy  of  this  code  he  generated, 
since  it  will  in  general,  contain  absolute  address  references  to  the  origi¬ 
nal  Name  Table.)  Often,  however,  the  actual  parameter  is  a  simple 
variable1'1  In  the  second  mechanism  (direct  parameter),  when  the  pro¬ 
cedure  is  culled,  the  IDCW  of  the  formal  parameter  is  set  up  as  if  il 
were  a  global  variable  with  a  pointer  In  the  1DCW  of  the  uctual 
parameter  The  Translator  determines  which  mechanism  tn  use  und 
poultices  the  appropriate  instructions.  Often  the  Tramlatoi  chose  to 
compile  an  indireel  parameter  where  a  direct  parameter  would  suffice 
because  it  was  ton  stupid. 

SYMBOI.  provides  a  mechanism  for  trapping  references  to  vari¬ 
ables.  procedures  and  labels,  called  the  ON  block.  The  IDCW  for  an 
iilenliftet  which  has  an 'ON  block  contains  a  pointer  to  the  object  string 
lor  thin  ON  Nock  and  a  bit  indicating  whether  or  not  the  option  is  in 
effect  litis  bit.  which  is  utitiully  set.  may  he  cleared  by  a  Disable 
instruction  (K2)  or  set  by  tin  Finable  instruction  (HI).  If  the  identifier 
is  ,t  variable  name,  the  ON  block  will  be  invoked  immediately  aftei  an 
assignment  to  that  variable  occurs.  If  the  Idenlifier  is  a  label,  the  ON 
block  will  be  invoked  upon  encountering  a  Go  To  statement  to  that 
laM  before  the  transfer  actually  lakes  place.  If  the  identifier  is  u  pro¬ 
cedure.  the  ON  block  will  he  Invoked  upon  encountering  a  call  to  that 
procedure  before  entry  tukes  plait. 

There  is  one  more  piece  of  infonnntinn  stored  in  the  IIX'W. 
Recall  that  a  vector  stored  in  memory  is  a  succession  of  arbitrarily  long 
scalar  values  placed  end-to-end.  Because  the  addresses  of  a  component 
of  a  vector  can  not  be  calculalcd.  finding  the  n  th  component  would 
mean  wanning  the  preceding  n- 1  components.  One  of  the  mechanisms 
uwd  to  speed  up  this  search  is  culled  eurrenl  pointer.  In  the  IDCW 
lor  the  structure  is  stored  the  subscript  used  in  the  last  referenee,  and 
the  address  ol  that  component  in  memory.  If  the  next  reference  is  to  a 
component  succeeding  the  last,  the  scutch  begins  where  the  Iasi  search 
left  off  t  he  mechanism  is  somewhat  limited  because  space  for  only 
two  digits  is  provided  in  the  IIX'W  for  the  subscript. 

Object  String 

lo  addition  lo  the  Name  lublcs.  (he  Translator  produces  a  single 
iik'iiioiv  string  called  the  obieet  string,  which  contains  the  code  directly 
I'sei'uli'il  by  the  Central  Processor. 

All  language  components  huve  been  .translated  into  a  post-fixed 
siting  lurm  in  the  object  string.  All  variable  references  are  made 
through  the  data  descriptors  in  the  Name  Table.  The  object  string  is 
not  at  all  altered  while  the  program  is  running,  and  consists  of 
n)veianils  and  operators.  Hie  operands  are  pushed  onto  a  UFO  stack 
as  they  are  encountered.  When  an  operator  is  encountered,  il  is 
passed  lo  the  appropriate  processor,  which  performs  the  operation  on 
the  operands  on  the  top  of  the  stack,  ami  replaces  them  with  the  result 
lit  urn )  ot  the  operation 

I  aclt  wont  ol  object  siting  may  contain  two  machine  instruc¬ 
tions.  one  in  each  hall  ol  the  word,  each  composed  of  an  8-bil  opera¬ 
tion  code  und  a  24-hil  address  Held.  The  cosies  FI*  through  EE  never 
appeal  m  the  obieil  slung  Ot  the  remaining  M2  instructions,  only  six 
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use  the  address  field  us  such:  Block  (90).  II  False  Then  Jump  (B5). 
Name  Table  Pointer  (DO).  Direct  Parameter  (IM).  Indirect  Parameter 
(D5).  and  Transfer  (D7).  The  Source  Pointer  (D9)  instruction  is  gen¬ 
erated  by  ihe  Translator  with  an  address  in  the  address  field,  hut  since 
this  instruction  is  treated  as  a  No-op.  the  address  held  is  not  really 
required.  Some  operations  must  always  appear  in  the  same  half  of  the 
word,  so  No-op's  (00)  are  used  to  fill  out  the  word  where  necessary. 

blocks,  Labels  and  End-of-Statemcnt 

The  first  instruction  of  each  block  in  the  object  string  is  the 
Block  instruction  (90).  The  block  entry  mechanism  does  not  occur  as 
a  result  of  this  instruction  however,  but  as  a  result  of  a  procedure  call 
or  ON  block  reference.  The  Block  instruction  is  always  placed  in  the 
second  half  of  a  word.  The  address  field  contains  the  address  of  the 
block's  Name  Table  The  accompanying  first  halfword  contuins  a  No- 
op  instruction. 

Each  block  ends  with  an  End  Block  instruction  (B7).  When  this 
instruction  is  encountered,  the  block  exit  mechanism  is  invoked:  The 
current  stack  is  deleted  and  the  calling  block's  stack  becomes  the  new 
current  stack.  From  this  stack,  the  status  register  und  program  locu¬ 
tion  counter  are  restored.  (Generally,  an  End  Block  instiuciion  is  pre¬ 
ceded  by  a  Return  instruction,  which  also  invokes  the  block  exit 
mechanism.)  The  stack  for  the  main  program  is  tagged  so  that  a  block 
exit  from  the  main  program  causes  a  normal  program  completion  shut¬ 
down  of  the  Central  Processor. 

A  Block  instruction  also  appears  in  the  object  string  at  each  label 
entry  point.  The  1DCW  for  that  label  contains  the  address  of  this 
Block  instruction.  Whenever  a  Block  instruction  is  encountered,  the 
contents  of  this  instruction's  address  field  is  compared  to  the  location 
of  tlte  current  Name  Table.  For  a  Go  To  within  a  block  or  for  block 
entry,  these  two  addresses  will  match,  hut  not  for  a  Go  To  across 
block  boundaries  The  Central  Processor  presumes  that  the  Go  To  is 
directed  towards  a  block  which  directly  or  indirectly  called  the 
currently  active  block,  and  performs  block  exits  until  the  proper  block 
is  found.  If  the  target  of  the  Go  To  is  not  within  one  of  these  blocks, 
the  main  program  will  eventually  he  exited,  and  the  Central  Processor 
will  shut  down  as  if  a  normal  completion  had  occurred. 

For  each  semicolon  or  END  statement,  an  End  Statement 
instruction  (BB),  and  a  Source  Pointer  instruction  (D9)  are  placed  into 
(lie  object  string.  The  address  Held  of  the  Source  Pointer  instruction 
contains  the  address  of  the  last  word  of  the  source  statement  in  the 
source  string.  The  Central  Processor  treats  this  instruction  as  a  No-op. 
Ihe  intended  use  of  the  Source  Pointer  instruction  was  to  facilitate 
debugging  by  linking  the  location  of  an  execution  error  to  the  offend¬ 
ing  source  statement.  The  use  of  this  facility  was  abandoned  when 
software  to  decompile  the  object  code  directly  to  source  code  was 
developed,  which  provided  precise  resolution  of  the  error  location 
wilhin  the  source  statement  and  because  there  were  problems  inherent 
in  the  Source  Pointer  mechanism. 15  lh  When  the  End  Statement 
instruction  is  encountered,  the  stack  is  cleared  of  any  remaining 
operands  (simply  by  resetting  the  top-of-staek  pointer),  and  user  inter¬ 
rupts  (if  e-iy)  are  handled. 

Scalars  and  Structures  in  the  Object  Siring 

The  String  Start  codes,  F0  through  F5,  always  appear  in  the  first 
byte  of  a  word,  and  indicate  the  beginning  of  a  scalar  value  in  data 
string  or  numeric  field  format.  If  the  value  is  one  word  lung  (indi¬ 
cated  by  a  set,  high  order  bit  in  the  last  byte),  then  the  word  is  pushed 
onto  the  suck.  Otherwise,  a  word  beginning  with  an  Efl  and  contain¬ 
ing  the  address  of  the  first  word  of  the  string  is  pushed  onto  the  stack, 
and  succcauve  words  of  the  abject  siring  are  fetched  (and  discarded) 
until  u  wurd  with  a  set,  high  order  hit  in  the  last  byle  is  found.  The 
String  End  character,  F6,  may  appear  in  any  byte  of  a  word,  but  is  not 
used  in  searching  for  the  last  word  of  a  string. 

The  code*  rC  through  FF  hav-:  been  deacribed  earlier  in  connec¬ 
tion  with  initial  structure  values  These  codes  may  be  used  to  con¬ 
struct  structure  values  on  the  stack  as  well  The  scalar  components  uf 
these  structures  in  the  object  string  may  be  arbitrarily  complex 


expressions.  Adjacent  scalar  components  arc  wpai.ncd  In  the  Field 
Maik  operator  ( IX 'I  When  a  word  beginning  with  •  »ic  ol  the  iltniac- 
lets  1C  through  FT  is  encountered,  that  word  is  pushed  onto  the  slack 
The  expressions  are  evaluated  just  as  if  no  structure  operators  had  been 
encountered,  and  the  result,  or  a  link  to  it  is  left  on  the  stack  At  a 
later  tune,  the  Reference  Processor  must  concert  the  linear  stiuctutc 
value  on  Ihe  stack  into  tree  format. 

Name  Table  Pointer  Instruction 

The  Name  Table  Pointer  instruction  (Dll)  is  used  tor  all  refer¬ 
ences  to  variables,  labels,  and  procedures  The  address  field  ol  tins 
instruction  points  to  an  IIX’W,  which  is  examined  he  the  Reference 
Processor  when  this  instruction  is  encountered  The  action  taken 
depends  on  what  is  found  in  the  1DCW. 

l-'or  variables  and  labels,  a  link  word  is  pushed  onto  the  stack 
This  word  contains  the  pointer  to  the  IDCW.  and  begins  with  a  char¬ 
acter  that  reflects  the  information  found  in  the  Name  Fable  Ed  (link 
to  simple  variable),  El  (link  to  structure),  nr  1:4  (link  to  simple  vari¬ 
able  with  value  stored  in  Name  Tuble).  II  Ihe  Name  Fable  Pointer 
instruction  is  preceded  by  an  IN  instruction  (UK).  the  link  word  will 
begin  as  follows:  EA  (IN  reference  to  simple  variabld.  1-H  (IN  refer¬ 
ence  in  structure),  or  EE  (IN  reference  to  simple  variable  with  data 
stored  in  Name  Tuble).  For  a  label,  the  link  word  begins  with  the 
character  E.7. 

A  variable  reference  may  be  followed  by  a  subscript  list  Expres¬ 
sions  to  evaluate  each  subscript  are  followed  by  an  Integerizc  operator 
(DA)  or  a  Colon  operator  (BA),  as  described  above  Following  this 
subscript  list  is  the  Perform  Subscription  instruction  (DD).  Actuul 
evaluation  of  this  subscripted  variable  reference  is  deferred  until  the 
value  nr  location  absolutely  must  be  bound  to  continue,  at  which  time 
Ihe  Reference  Processor  will  perform  Ihe  subscription  A  major  change 
to  the  original  design  was  made  when  problems  associated  with  an  ear¬ 
lier  binding  were  encountered.17 

Arithmetic  Operations 

When  an  arithmetic  operator  is  encountered  m  the  object  string, 
the  Format  Processor  first  converts  the  operands  to  numeric  held  for¬ 
mat  (if  necessary)-  and  then  the  Arithmetic  Processor  (or  the  Fotmal 
Processor  for  the  Absolute  Value,  Negate,  and  Formal  openttois)  car¬ 
ries  out  the  operation. 

The  Add  (AB).  Subtract  (AD),  Multiply  (AA).  and  Divide 
(AF)  operators  cause  the  top  two  operands  on  the  stuck  to  be  replaced 
by  their  sum,  difference,  product,  or  quotient,  respectively,  in  numeitc 
field  format.  The  value  is  either  stored  directly  on  the  stack,  or  a  link 
to  the  temporary  value  is  stored  on  the  stuck  if  the  result  contains 
more  than  nine  significant  digits. 

A  two  digit  Limit  register  places  un  upper  limit  on  the  tuiiuhet 
of  significant  digits  to  which  these  four  operations  arc  carried  out .  and 
hence,  an  upper  limit  on  the  precision  of  the  results  (  Ihe  precision  ol 
the  result  muy  (>e  less  than  this  limit,  depending  on  the  precision  ol  the 
operands.  I  This  register  may  he  read  or  written  by  software,  and  t> 
treated  as  a  symbolic  variable.  The  Limn  instruction  IBC  )  causes  a 
word  to  be  pushed  onto  the  stack  beginning  with  a  BC  This  word  is 
later  converted  to  the  two  digit  value  in  duta  siring  tormut  il  the  value 
is  being  read.  A  one  bit  Limited  flag  is  set  or  cleured  as  a  result  ot 
these  operations  depending  on  whether  or  not  the  precision  ot  the 
result  would  have  been  more  than  the  Limit  register  allowed  This 
flag  can  only  be  read  by  software.  When  the  Limited  instruction  (9K) 
is  encountered,  the  value  is  pushed  onto  the  stack  as  a  'll"  or  I"  in 
data  string  format,  and  the  flag  is  cleared. 

There  are  six  numeric  comparison  operators:  Fxjual  to  (BD).  Not 
Equal  to  (9D),  Greater  than  (9E),  Less  than  (9t).  Greater  than  or 
Equal  to  (9B).  and  Less  than  or  Equal  to  (9A).  These  operators  cause 
the  top  two  operands  on  the  stack  to  be  replaced  by  a  "I"  or  a  "()''  (in 
data  string  format)  based  on  the  outcome  of  the  comparison  When 
numbers  of  unequal  precision  arc  compared,  the  comparison  is  carried 
out  to  the  precision  of  Ihe  lesst  preciie  operand 
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The  two  monadic  arithmetic  operators.  Absolute  Value  (tu)  and 
Negate  (DB).  simple  alter  the  sign  of  the  top  operand  on  (he  stack  as 
requital  The  1-oimal  o|Xialor  converts  the  miinerie  operand  second 
limn  the  top  ol  the  stack  lo  data  siring  lotmal  using  a  control  strine  on 
■  lie  toll  of  tile  slack  as  a  template.1" 

Character  String  Operations 

The  character  siring  operations  arc  carried  out  by  the  Formal 
Processor,  which  will  also  unpack  numbers  (if  necessary)  from  numeric 
field  formal  to  data  string  format  before  proceeding 

The  Join  i»pcrnlor  IHF.)  replaces  the  two  string  operands  on  the 
top  ot  the  stack  with  a  siring  formed  by  concatenating  the  operands 
I  he  Mask  operator  INI  !  is  a  general  purpose  siring  editing  operator  l" 

I  hc  operand  on  the  lop  ol  the  stack  is  used  as  an  editing  templalc  on 
the  seunul  operand  The  result  replaces  ihe  operands  on  the  stack 

I  here  arc  thtee  character  siring  comparison  operators:  Helme 
(NN),  Same  (HV).  and  After  (HA).  As  for  Ihe  arithmetic  comparison 
opcialors.  the  two  operands  aic  replaced  on  Ihc  stack  by  a  T’  or  a  "IT 
tin  data  siring  formal)  based  on  the  outcome  of  the  comparison.  1  wo 
strings  must  be  of  equal  length,  as  well  as  contain  the  same  characters 
in  Ihe  same  order  for  the  result  of  the  Same  operator  to  yield  a  ' 

The  Before  and  After  operators  compare  two  strings  based  on  a  special 
collating  sequence  (null  character.  special  characters. 
AaBbCc...XxYyZz(ll2...7HS)  rather  than  on  the  magnitude  of  the 
internal  eight  bit  ASCII  representation  as  is  customarily  done  When 
comparing  unequal  length  strings.  Ihc  shorter  string  is  considered  In  lx- 
padded  on  Ihe  end  with  null  characters. 

Boolean  Operatiuns 

I  here  are  three  Boolean  operators:  Not  (Hll).  And  (HO  and  Or 
(HD)  I  he  operands  used  m  Boolean  expressions  are  character  strings 
binned  front  the  three  characters  ''ll".  ”1".  and  the  space  character 
(which  is  ignoicd).  The  Not  operator  replaces  the  lop  operand  on  the 
stack  with  a  siring  (or  link  to  a  string)  formed  from  the  operand  by 
converting  each  H"  to  a  "I",  each  "I"  to  a  D"  and  removing  each  space 
character.  The  And  and  Or  operators  replace  the  top  two  stack 
operands  with  their  bitwise  conjunction  or  disjunction,  respectively 
When  these  latter  two  operators  are  used  on  unequal  length  operands 
(excluding  spaces),  the  shorter  operand  is  considered  to  be  padded  on 
the  end  with  ll"s. 

The  Formal  Processor  is  responsible  for  executing  Ihe  Brxilean 
operations.  As  tor  ihe  character  siring  operations,  operands  will  Ik1 
convened  to  data  siring  formal  if  necessary  (since,  for  example.  "ItMl" 
could  be  both  Ihe  bit  string  of  length  three,  and  Ihe  integer  following 
W). 

Assignment  Operations 

There  are  (wo  assignment  operators:  Left  Assign  (OF),  and 
Right  Assign  (IF).  Ihe  lormer  assigns  the  value  indicated  by  (tie  top 
operand  on  the  stack  to  the  location  indicated  by  the  t<perand  second 
to  the  lop  of  the  stack;  the  latter  assigns  in  the  opposite  direction. 
Until  one  of  these  operators  is  encountered  in  the  object  string,  it  is 
not  known  whether  the  preceding  operands  are  to  be  used  for  locations 
or  values  This  is  why  pointers  to  IDCW’s  are  used  on  the  stack  for 
variables,  rathei  than  the  variable's  value  or  location.  Before  Ihc 
assignment  is  curried  out.  any  links  to  values  are  converted  to  actual 
values  on  the  stack,  even  if  this  may  require  more  than  one  word. 
I  his  operation,  and  the  final  assignment  of  value  to  locution  are  per¬ 
formed  hv  the  Rclcronec  Processor  Fora  structure  assignment  state¬ 
ment.  the  value  appears  on  the  stack  as  a  structure  in  linear  format 
The  Reference  Processor  is  responsible  for  converting  this  value  to  tree 
formal  as  it  Mutes  the  value 

Transfer  and  tiu  I  n  Instructions 

I  he  Transler  instruction  (D7|  simple  resets  the  program  location 
counter  lo  the  value  m  the  instruction's  address  field  Ibis  instruction 
is  gcncintfd  lo  dolour  around  code  in  Ihe  object  siring  which  is  not  lo 
lx.’  executed  in-line  such  as  initial  data  values  micrnul  blocks  Fisc 


clauses,  and  code  lo  evaluate  actual  parameters.  It  is  not  generated  as 
a  result  o!  the  SPI  tin  To  statement  however. 

I  here  is  also  a  conditional  transfer  instruction  which  is  generated 
lor  each  SPL  If  statement,  called  the  If  False  Then  Jump  instruction 
(US).  It  is  used  lo  jump  around  the  axle  for  the  Then  clause,  to  the 
code  lor  the  Else  clause  (il  any)  if  the  conditional  expression  in  the  If 
statement  is  false.  Preceding  this  instruction  there  will  be  an  expres¬ 
sion  which  should  result  in  a  single  Boolean  value  on  Ihe  top  of  the 
stack.  (Anything  else  will  cause  a  processing  error  shutdown.)  This 
value  is  tested  and  if  it  is  a  "0".  then  the  program  location  counter  is 
art  to  the  value  in  the  instruction’s  address  field.  Otherwise,  execution 
continues  at  Ihe  instruction  following  the  If  False  Then  Jump  instruc¬ 
tion 

A  Go  To  instruction  (05)  is  generated  for  each  Go  To  in  the 
source  piogram  Unlike  Ihe  two  jump  instructions,  it  contains  no 
address  in  its  address  field.  The  target  of  the  Go  To  is  found 
indirectly  from  the  lop  operand  on  Ihe  stack.  This  operand  may  he  a 
link  to  a  laliel  (containing  a  pointer  to  the  IDCW  for  a  label),  or  a 
simple  oi  subscripted  variable  reference.  The  Reference  Processor  is 
called  on  to  evaluate  the  variable  reference  and  place  a  label  value  on 
the  stack  Recall  that  label  values  ulso  contain  pointers  to  the  IDCW's 
of  labels.  Until  (he  IDCW  is  examined,  it  is  not  known  whether  the 
label  has  lx-en  defined,  or  even  If  ihe  IDCW  is  for  a  label  at  all.  If  (he 
IIX'W  is  not  for  a  defined  label,  a  processing  error  shutdown  will 
result  Otherwise.  Ihe  IDCW  will  contain  the  address  of  a  word  con¬ 
taining  a  Block  instruction  where  execution  will  continue  as  described 
above. 

Procedure  Call,  Parameters,  and  Return 

Ihe  axle  in  the  object  siring  for  n  procedure  call  (or  function 
rclerenec)  will  in  general  consist  of  three  parts:  a  Name  Table  Pointer 
instruction  lor  the  procedure,  code  lo  evaluate  any  indirect  parameters, 
and  parameter  instructions.  Ihe  code  for  Indirect  parameters  is  not 
executed  in-line,  so  if  there  ure  any  indirect  parameters,  then  a 
transfer  instruction  follows  the  Name  Table  Pointer  instruction 
directed  at  the  first  parameter  instruction  to  be  executed.  Then  for 
each  actual  Indirect  parameter  is  the  code  to  evaluate  that  parameter, 
followed  by  a  Parameter  Return  Instruction  (D8).  (This  instruction  is 
of  course  used  when  Ihe  parameter  is  referenced  to  signal  the  end  of 
ihe  actual  parameter  code.) 

Lastly .  if  there  arc  any  direct  or  indirect  parameters,  there  are 
Ihe  parameter  instructions,  which  will  appear  in  the  object  string,  two 
per  word,  in  the  order  opposite  to  the  order  in  which  the  correspond¬ 
ing  parameters  apixrur  in  the  source  program.  For  indirect  parameters, 
there  will  lx’  an  Indirect  Parameter  instruction  (D5)  containing  the 
address  of  the  code  to  evaluate  the  actual  parameter.  For  direct 
parameters,  there  will  be  a  Direct  Parameter  instruction  (D4j  contain¬ 
ing  the  address  of  the  IDCW  for  the  actual  parameter. 

After  the  Name  Table  Pointer  instruction  is  encountered,  Ihe 
Reference  Processor  informs  the  Instruction  Sequencer  that  Ihe  identif¬ 
ier  is  for  a  procedure,  The  Instruction  Sequencer  then  begins  louklng 
for  parameter  instructions,  ignoring  No-ops  and  executing  at,/  Transfer 
instructions.  The  parameter  instructions  ate  pushed  onto  the  stack, 
one  per  word.  The  first  instruction  which  is  not  a  No-op,  Transfer,  or 
parameter  instruction  will  he  executed  on  return  from  the  procedure 
(the  return  point).  The  parametet  instructions  are  Ihcn  popped  from 
the  stack  (note  that  the  stack  operations  reverie  their  order);  the 
IDCW's  of  Ihe  formal  parameters  are  modified  as  described  above.  If 
the  numbci  of  formal  parameters  does  not  equal  the  number  of  actual 
parameters,  a  processing  error  shutdown  occurs.  FoOowing  these 
operations,  a  block  entry  is  made  at  the  start  of  the  procedure's  abject 
axle,  which  is  found  from  the  procedure's  IDCW. 

A  reference  lo  a  direct  formal  parameter  is  identical  to  a  refer¬ 
ence  to  a  Global  variable,  label,  or  procedure,  When  ah  indirect  for¬ 
mal  parameter  is  referenced,  a  word  Is  pushed  on  the  stack  containing 
ihc  state  of  the  Instruction  Sequencer.  Program  execution  then  contin¬ 
ues  at  ihe  address  designated  in  the  formal  parameter's  IDCW,  just  as 
if  no  parameter  reference  was  in  progress.  (The  code  to  evaluate  the 
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actual  parameter  may  itself  contain  indirect  parameter  references.) 
When  the  Parameter  Return  instruction  (D8)  it  encountered,  the  top 
of  stuck  register  contains  the  actual  parameter  (value  or  address)  or  a 
link  to  it.  The  state  of  the  instruction  Sequencer  is  restored  I  rum  the 
topmost  word  of  the  stack  in  memory.  Program  execution  then  contin¬ 
ues  as  before  the  parameter  reference. 

The  Return  instruction  may  or  may  not  return  a  value  (or  locu¬ 
tion),  which  may  or  may  not  be  used.  The  internal  lop-of-stack  regis¬ 
ter  will  contain  the  operand  to  be  returned,  if  any.  The  block  exit 
mechanism  invoked  by  the  Return  instruction  (D6)  will  delete  the 
memory  space  occupied  by  the  current  stack,  but  will  not  clear  this 
register.  So  this  register  becomes  the  top  of  stack  for  the  culling  block. 

If  (he  calling  block  is  expecting  a  value,  and  the  register  is  empty,  a 
processing  error  shutdown  will  result.  If  a  value  is  relumed,  and  none 
is  required,  no  processing  error  shutdown  will  result.  This  is  because 
the  End  Statement  instruction,  which  will  follow  a  simple  procedure 
call,  clears  all  operands  from  the  stack,  including  the  contents  of  the 
top-of-stnek  register. 

Input  and  Output 

Input  and  Output  operations  transfer  and  transform  Information 
lietween  memory  and  the  outside  world.  Six  I/O  stutus  bits  arc  main¬ 
tained  by  the  Instruction  Sequencer  lo  indicate  the  type  and  mode  at 
the  I/O  operation.  When  the  Input  instruction  (K(l)  is  encountered, 
the  Input  I/O  status  bit  is  set,  and  the  remaining  bits  arc  cleared.  The 
Output  instruction  (81)  causes  the  Output  I/O  stutus  bit  to  be  set  und 
the  others  to  be  cleared. 

The  remaining  four  bits  are  used  lo  indicate  the  I/O  mode  hol¬ 
lowing  the  Input  or  Output  instruction  in  the  object  string  there  may 
Ire  a  String  instruction  (A3),  a  Data  instruction  (Al),  or.  for  Input 
only,  an  Exact  Instruction  (A4)  or  an  Empirical  instruction  (AS).  For 
each  o(  these  instructions  there  is  a  corresponding  I/O  status  hit  which 
is  set  when  the  instruction  is  encountered.  The  List  mode  is  indicated 
by  the  absence  of  any  other  I/O  mode. 

The  I/O  mode  determines  the  type  of  data  transformation  to  be 
performed.  In  memory,  the  data  may  be  a  scalar  vulue  in  datu  string 
or  numeric  field  format,  or  a  structure  in  free  format.  In  the  outside 
world,  the  data  exists  as  an  ASCII  character  string.  Structures  in  the 
outside  world  are  represented  explicitly  using  the  characters  and 
“>"  to  delineate  each  structure  or  substructure,  and  the  field  mark 
character  T  is  used  to  separate  adjacent  scalar  components.  It  is  the 
Input/Outpui  Processor's  responsibility  to  transform  data  between  this 
explicit  structure  format,  and  the  internal,  linear  format,  if  the  I/O 
mode  calls  for  such  a  transformation. 

Data  may  be  directed  to  or  from  a  number  of  different  I/O  dev¬ 
ices.  If  the  default  device,  with  associated  device  number  zero,  is  nut 
to  tie  used,  then  following  the  I/O  type  and  ntude  instructions  .  there 
will  lie  code  for  an  expreauon  for  the  device  number,  followed  by  a  To 
instruction  (AO)  for  output,  or  a  From  instruction  (Hll)  lor  input 
(After  the  To  or  From  instruction,  a  Comma  instruction  is  expected 
but  ignored.)  The  code  to  evaluate  the  expression  is  executed,  and  a 
vulue  or  link  is  left  on  the  stack.  The  To  or  From  instructions  force 
the  value  lo  be  placed  on  the  stack,  and  then  to  be  inlegerized,  us  fur 
subscripts.  The  two  lea*  significant  (BCD)  digits  arc  extracted  and 
designated  ax  the  device  number  for  the  operation.  The  Channel  Con¬ 
troller  associatca  devices  with  device  numbers. 

Next  lo  appear  in  the  object  siring  will  be  the  I/O  items, 
separated  by  Comma  inatructions  (AC).  The  Comma  instruction 
causes  the  value  of  the  preceding  I/O  item  to  be  placed  on  the  stack 
(or  output;  for  input,  it  causes  the  Input/Output  Processor  to  gel  an 
input  value  which  is  then  aaugned  to  the  preceding  I/O  item  The  last 
I/O  item  is  followed  by  either  an  End  Statement  or  an  End  Block 
instruction,  each  of  which  is  treated  as  having  been  preceded  by  a 
Comma  instruction.  In  addition,  fur  output .  these  two  instructions 
cause  the  Input/Output  Processor  to  output  the  values  on  the  stack, 
starting  at  the  bottom  and  ending  at  (he  top. 

For  Input  DatH,  there  will  be  no  I/O  items  since  boih  variable 
names  and  values  come  from  the  outside  world.  Each  I/O  item  fur 


Output  Data  will  he  a  simple  variable  reference  (Name  I  able  Pointer 
instruction).  For  the  remaining  input  modes,  an  I/O  nent  may  be  a 
simple  or  subscripted  variuble  reference  or  a  procedure  reference 
(which  must  eventually  return  a  simple  or  subscripted  variable  refer- 
cnee).  For  the  remaining  output  modes,  an  I/O  item  may  he  anv 
expression.  The  code  for  each  I/O  item  is  executed  exactly  as  it  no  I/O 
instructions  had  been  encountered. 

For  all  output  modes,  the  actual  value  of  the  t'O  item  must  he 
placed  on  the  stack.  A  scalar  value  in  numeric  field  formal  is  con¬ 
verted  to  data  siring  format  by  the  Formal  Processor  If  the  vulue  is 
for  a  structure,  a  temporary  stuck  is  created  onto  winch  the  Reference 
Processor  places  the  vulue,  convening  it  front  tree  formal  to  linear  lor- 
mat  The  structure  is  then  copied  to  the  regular  stuck  and  any 
numeric  scular  components  arc  converted  to  datu  string  lormut. 

For  Output  Data,  the  variable's  name  musi  be  placed  before  the 
value  on  (he  slack.  The  name  is  found  in  the  one  or  more  words 
preceding  the  variable's  IDCW,  a  pointer  to  which  will  exist  in  the  top 
of  stack  register  as  a  result  of  the  just  executed  Name  Table  Pointer 
in-traction.  If  the  variable  is  scalar  valued,  a  word  is  placed  before 
and  after  the  value  of  the  variable  on  the  stuck  liegmning  with  the 
characters  I'll  and  FI,  respectively.  The  Inpul/Output  Processor  con¬ 
verts  each  of  these  words  to  (he  field  mark  character  lo  delineate 
the  value  in  the  output. 

Foi  Input  Data,  the  lnput/Oulput  Procesxoi  calls  on  the  Transla¬ 
tor  to  extract  the  variable  names  and  values  (rnm  the  input  chutaciet 
string  and  perform  the  assignment.  For  Input  List  and  Input  String, 
the  Input/Output  Processor  will  leave  a  value  on  the  stack  on  top  ol  u 
link  word  pushed  onto  the  stack  us  a  result  of  executing  the  code  for 
the  I/O  item.  The  Reference  Processor  is  called  on  lo  perform  an 
assignment  operation,  just  as  if  a  Left  Assign  instruction  had  Iwcn 
encountered. 

The  Exact  und  Empirical  input  modes  are  used  to  convert  input 
values  to  numeric  field  format.  A  temporary  stack  is  created  onto 
which  (he  Input/Outpui  Processor  places  the  input  value.  The  value  is 
moved  to  the  regular  stuck  and  the  scalar  value,  or  ouch  scular  com¬ 
ponent.  which  must  he  u  number,  Is  converted  to  numeric  field  formal. 

If  a  precision  lug  was  given  in  the  Inpul  vulue,  it  is  used  in  the  ennver* 
sion;  otherwise,  the  precision  is  determined  by  the  input  mode  The 
assignment  operation  is  then  carried  out  as  for  the  I  isl  and  String 
input  modes. 

Pause  and  System 

The  Pause  Instruction  (%)  und  System  instruction  (97)  cause  the 
Central  Processor  to  loud  the  error  code  register  with  the  one  byte 
opcode  and  then  shut  down.  The  hardwired  System  Supervisor  notices 
that  the  Central  Processor  has  shut  down  and  examines  the  error  code 
register.  If  fhe  instruction  was  Pause,  then  (he  System  Supervisor 
deletes  the  process  from  the  Central  Processor's  run  queue,  'litis  hus 
the  effect  of  hailing  the  execution  of  that  process.  A  paused  process  is 
restarted  when  the  user  presses  the  Continue  button  on  his  terminal. 1  ’ 
If  the  instruction  was  System,  then  Ihe  System  Supervisor  executes  a 
previously  defined  memory  siring  of  control  words.  The  System 
instruction  is  used  in  "privileged"  softwurc  lo  modify  low  level  system 
dutu  structures  which  are  normally  maintained  by  hardware.  Adding 
oi  deleting  a  process  trout  a  processor  queue  is  a  typical  example  ol 
the  use  of  Ihe  System  instruction. 

l-ogtcal  Memory 

As  mentioned  previously,  SYMBOL  differs  front  most  von  Neu¬ 
mann  computers  in  that  the  memory  structure  is  not  organized  us  a 
contiguous  set  of  sequentially  numbered  storage  cells  Instead,  a  "logi¬ 
cal  memory"  structure  implemented  by  Ihe  Memory  Controller  is 
imposed  on  top  ol  the  virtual  memory  system.  The  Memory  Con¬ 
troller  lakes  each  virtual  page  and  divides  it  up  into  three  sections 
The  tirsl  lour  words  ol  the  page  are  the  "Page  llcudcis".  which  tonlum 
pointers  and  status  mlormatau).  One  ol  the  poimcis  links  pages 
together  in  a  lorward  linked  list;  SYMBOL  nominally  used  throe  ol 
these  "Page  Lists'  lor  each  terminal.6  The  lost  Page  I  isl  was  loi  Ihe 
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user's  source  imtgrum,  the  second  for  user  data.  the  stuck  and  Name 
l  uhlcs.  and  the  third  lor  Object  String  The  Cage  I  leaders  also  indi¬ 
cate 'a  vail  able  space  in  the  page  by  status  hits,  and  iHitside  the  page  hs 
a  Space  Available  l  ast  pointer  which  points  the  the  nest  page  that  eon 
tains  available  space  The  remainder  til  .he  page  is  divided  into 
twenty-eight  eight-word  'Groups",  and  twenty-eight  "Group  l  ink 
Words”.  The  Memory  Controller  then  organizes  these  contiguous  eight 
wortl  'groups''  into  memory  strings  with  u  doubly  linked  list.  All  nt 
the  processors  other  than  the  Memory  Controller  then  view  this  logical 
memory  structure  as  the  fundamental  memory  organization  ol  the 
machine. 

The  Privileged  Memory  Operation* 

There  are  sixteen  Instructions  which  operate  directly  with 
memory  addresses  to  read  or  alter  storage.  These  instructions  arc 
issued  by  the  hardware  procnaoct  or  by  systems  programs  which  have 
been  translated  in  "privileged”  mode.  A  request  for  memory  in  the 
Memory  Controller  consists  of  a  92  bit  value  consisting  of  three  fields, 
Mrst  Is  u  four  bit  field  fttr  the  Page  l.ist,  followed  hy  a  twenty-four  hit 
uhstiluic  virtual  address  field,  urn!  a  sixty-four  bit  data  field.  These 
fields  are  transmuted  to  the  Memory  Controller,  which  may  use  and 
modify  them,  reluming  them  to  the  originating  proccaor  Bach 
memory  request  is  ulso  accompanied  by  the  terminal  number.  Because 
words  in  memory  are  not  necessarily  contiguous,  no  address  indexing 
calculations  cun  he  performed.  Bor  this  reaaon  the  memory  operations 
ure  of  the  flavor.  "Here  is  an  address,  get  me  the  data  at  that  address 
und  tell  me  what  the  address  of  the  next  word  Is."  More  specifically  the 
sixteen  memory  operations  are  as  follows: 

Assign  Groupt  Used  to  allocate  a  new  memory  string.  If  the  transmit¬ 
ted  address  is  non-zero,  the  Memory  Controller  will  try  In  allocate  a 
group  from  the  same  page.  If  no  group  la  available  on  that  page,  an 
empty  group  will  tie  looked  for  by  following  the  Space  Available  List 
pointer  lo  a  page  which  has  free  space.  If  there  are  no  pages  on  the 
Cage  l.ist  with  space,  a  new  page  will  he  allocated  from  the  system 
Avulluhlc  Page  last  If  the  iransmitled  address  field  is  zero,  then  the 
Memory  Controllci  will  allocate  a  group  from  the  same  Page  List  as 
specified  in  the  page  list  Held.  If  the  transmitted  page  list  field  and 
the  address  field  arc  fw*sh  zero,  then  a  new  page  Is  allocated  and  the 
fiut  group  on  the  page  will  he  allocated.  The  returned  address  Is  the 
address  of  the  first  word  of  the  assigned  group. 

Belch  and  Follow i  Retumi  the  uata  at  the  specified  address  and 
returns  the  address  of  the  following  word  In  the  string, 

B'eteh  Ravers*:  Returns  the  preceding  word  in  the  string  und  Its 
address. 

Follow  and  Fetch:  Returns  the  data  and  address  of  the  word  following 
the  specified  uddress. 

Star*  and  Assign:  Stores  the  data  at  the  indicated  uddress  and  returns 
the  address  of  the  successor  word.  If  no  successor  word  exists,  a  new 
group  Is  allocated  as  Indicated  tn  the  Assign  Group  Instruction,  and  is 
linked  onto  the  current  storuge  siring. 

Store  Only:  Stores  the  data  at  the  Indicated  address,  The  returned 
address  Is  changed  hy  adding  ortc  to  the  low  order  three  hits  modulo 
eight,  (litis  has  the  effect  of  wrapping  the  address  around  in  the 
group.) 

Store  and  Insart:  Stores  the  word  in  the  indicated  address  and  returns 
the  successor  address.  If  the  iranunitted  address  specifies  the  last  word 
of  a  group,  then  a  new’  group  is  allocated  and  inserted  between  the 
group  of  the  transmitted  address  and  the  group  which  fallowed  it. 

Inaart  Group:  A  new  group  Is  allocated  und  inaerted  after  the  group 
specified  hy  the  transmitted  address.  The  returned  addrtw  is  thut  of 
the  first  word  of  the  new  group. 

Drift*  String:  Deletes  a  memory  Mring;  the  transmitted  address  must 
he  that  of  the  first  group  of  the  siring.  The  associated  Page  List  must 
ulso  he  supplied  so  that  (he  string,  whtn  reclaimed,  can  be  returned  to 
the  proper  available  space  list  If  the  Page  List  supplied  is  that  of  the 
user  data,  then  pointers  to  substructures  will  he  looked  for.  and  that 
H|iacc  will  he  deleted  also. 


Dricte  tu  Bind  of  String:  Obtains  the  uddress  of  the  succeeding  group 
and  reclaims  lhal  and  all  following  groups  The  associated  Page  List 
must  also  he  supplied  so  lhal  Ihc  reclaimed  part  of  the  string  can  be 
returned  10  the  proper  available  space  list  ll  the  Page  Lint  supplied  is 
lhal  of  the  user  data,  then  pointers  to  substructures  will  be  looked  for, 
and  that  space  deleted  also. 

Delete  Page  Met:  The  Page  List  supplied  will  be  reclaimed  for  the  ter¬ 
minal  on  which  the  request  was  made.  This  operation  is  handled  by 
the  Memory  Reclaimer. 

Reclaim  Group:  If  the  transmitted  address  is  zero,  fetch  the  top  of  the 
terminal's  garble  stack:  otherwise  link  the  group  onto  its  page’s  avail¬ 
able  group  list. 

Ketch  Direct:  Used  for  fetching  one  of  the  terminal  header  registers, 
or  any  absolute  core  address  (rather  than  a  virtual  addrem).  The  data 
at  the  real  memory  address  is  returned.  The  returned  address  is 
changed  hy  adding  one  to  the  low  order  three  bits  modulo  tight. 

Store  Direct:  Stores  the  data  at  the  real  memory  address  given.  The 
relumed  addrem  is  changed  by  adding  one  to  the  low  order  three  bits 
modulo  eight. 

Brick  Terminal  Header:  Used  lo  fetch  one  of  the  21  header  registers 
associated  with  each  terminal.  The  Fetch  Terminal  Header  iriatruetton 
differs  from  the  Fetch  Direct  Instruction  in  that  the  Memory  Controller 
automatically  Inserts  the  terminal  number  Into  the  addrem  field.  This 
allows  the  addrem  of  a  particular  terminal  header  lo  be  Specified  in  a 
terminal  independent  manner  The  addrem  fetched  la  dm  specified 
physical  addrem  with  the  terminal  aumber  added  hi  shifted  left  by 
three  bits. 


Store  Terminal  Header  Used  to  store  one  of  the  21  had  iter  regimen. 
The  addrem  stored  into  is  the  specified  physical  addrem  with  the  termi¬ 
nal  number  added  in  shifted  ten  by  three  Mis.  i 

Cunciuaieu 

The  SYMBOL  Instruction  set  has  now  boen  described  tn  enough 
detail  to  show  the  ooreptoxitirt  of  implementing  a  high  total  Instruction 
set  in  hardware.  Many  further  detulto  extol,  but  ton*  are  relatively 
minor,  and  would  pmhahiy  not  he  of : interest. as: tom  madar.  One  of 
(he  reasons  that  the  inatructgm  K1  had  ■**  bgu  describ'd .aariier  was 
that  the  users  of  the  machine  ware  not  luppwad  to- good  To  taw 
about  the  machine  level  inatauctiop  set,  i' 

SPL  wee  the  todgiiaito  bH  die  macHine. 
actually  turned  dpi  to  be  iMcdM.  few  l 
tiun  set,  Only  by  a  pi  Maudlin  g#«MUt  !l 
tlon  details  of  the  lemiuction  tot  vret#  the  t 
many  InaffUctowitea  which  ekMmMn fiSrmstohiiM'  Thrt,.  _ 
is  far  from  Ideal;  po$nm  metouwmenh1*  torn*  th*Tlto'4nttsdldg\ 
used  wax  very  Ineffideriti  frahe  of  toe*  toafBtoxnetm-  tie  eacueed  i 

when  it  is  realized  that  SYMBOL  wee  tut  (iipltwitaf  umtojwe.'and 
that  the  designers  were  link  tod  In  the  effort  imf  Ooxdd  Upend  in  optim¬ 
ization.  Nevertheless,  we  feel  It  to  Important  tome  (the  fnatajefion  set  be 
documented  ns  it  was  impiememed.  This  pap*,  qsad  In  edhjuMtton 
with  the  previously  published  papers  enmptete*  ppa  docui*«tatira 
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ABSTRACT 

The  development  of  debugging  tools  on  the  high 
level  language  SYMBOL  computer  is  described,  The 
software  system  developed  allows  a  detailed  interac¬ 
tive  investigation  of  the  dynamic  and  static  program 
structure  and  user  variables  entirely  at  the  source  pro¬ 
gram  level  for  u  procedural  block-structured  program¬ 
ming  language.  Source  statements  are  "de-corn  pi  led" 
from  the  object  code,  descriptors  and  hardware  main¬ 
tained  type  tags  allow  the  unambiguous  interpretation 
of  data  values.  Language  constructs  in  the  SYMBOL 
programming  language  which  aid  in  debugging  are 
also  described.  Comments  are  made  on  the  evalua¬ 
tion  of  the  system  and  how  the  debugging  environ¬ 
ment  was  affected  by  the  high  level  language  architec¬ 
ture  of  the  SYMBOL  machine. 

Introduction 

One  of  the  motives  often  suggested  for  High  Level  Language 
Computers  has  been  that  they  make  program  debugging  easier.  The 
high  level  language  SYMBOL  computer  system' -2  J  provided  a  unique 
opportunity  to  test  out  this  hypothesis.  In  this  paper  the  state-of-thc- 
art  debugging  hails  developed  for  SYMBOL  will  be  presented,  along 
with  a  description  of  how  nnd  why  these  tools  were  developed.  The 
exposition  of  these  debugging  tools  is  important  for  two  reasons.  First, 
it  documents  how  debugging  was  achieved  on  perhaps  the  moat 
advanced  high  level  language  computer  yet  constructed.  Second,  it 
completes  documentation  on  what  many  uten  observed  to  be  the  most 
important  feature  of  SYMBOL  -  the  high  level  language  programming 
and  debugging  environment.  Only  by  examing  this  user  visible  system 
software  imposed  on  lop  of  the  SYMBOL  architecture  can  one  make  a 
judgement  on  the  effect  the  high  level  architecture  had  on  the  debug¬ 
ging  environment. 


Unveiled  in  1971.  the  SYMBOL  computer  system  had  as  its 
prime  goal  to  demonstrate  with  a  full-scale  working  computer  that  a 
procedural  general-purpose  programming  language  and  a  large  portion 
of  a  time-shared  operating  system  could  be  implemented  directly  in 
hardwsre.  This  approach  was  intended  to  show  a  marked  improve¬ 
ment  in  computational  rales  over  conventional  systems.  Almost  every 
aspect  of  the  system  was  unique,  front  its  eight  special  function  proces¬ 
sors  to  the  totally  new  SPL4  programming  language.  While  the  system 
was  capable  of  running  totally  without  system  software  this  was  rarely 
done.  System  software  for  SYMBOL  greatly  contributed  to  the 

tWi»k  ik*H'  at  low.i  Stale  Umvervlv  undo  NSF  pint  OJVMW7X 


"system"  interface  which  appeared  to  the  user  as  independent  of  the 
hardware  or  software  implementation. 

Early  Debugging 

Without  software  SYMBOL  provided  no  facilities  for  debugging 
programs  with  execution  errors.  Consequently  one  of  the  most  useful 
programs  early  in  the  project  was  a  traditional  interactive  memory 
dump.  This  fairly  small  program  was  effective  for  program  debugging 
if  one  was  familiar  with  the  high  level  instruction  set  and  data  organi- 
cation  Upon  detection  of  an  execution  error  the  System  Supervisor 
would  suspend  the  user  program,  save  the  24  "header"  registers  in  a 
known  place,  and  then  start  up  a  "Monitor"  program  on  the  user's  ter¬ 
minal,  From  the  Monitor  the  user  could  enter  the  dump  program  to 
look  at  his  dead  program  and  its  source.  As  with  moat  users,  the  first 
questions  to  be  answered  are  why  and  where.  The  Hnt  place  to  look 
was  in  the  AH1  header,  because  one  of  the  bytes  was  an  "Error  Code 
Character";  this  was  then  translated  to  English  by  looking  at  one  of  the 
many  Engineering  Reference  Cards  lying  around.  Once  the  nature  of 
the  error  was  determined  the  current  object  code  addrem  was  taken 
from  the  left  half  of  the  AH2  header.  By  dumping  five  or  rix  words 
of  object  code  al  this  address  the  user  could  usually  encounter  a  Source 
Pointer  opcode,  generated  by  the  Translator  it  each  semicolon  in  the 
source  program.  The  addins  field  of  the  Source  Pointer  instruction 
was  an  absolute  address  pointing  to  the  corresponding  line  in  the  origi¬ 
nal  source  code.  This  entire  process  could  be  done  at  a  terminal  in  less 
than  a  minute. 

The  Sourer  Pointer  Problem 

Using  the  source  pointers  left  by  the  Translator  to  find  the  source 
line  was  straightforward,  and  so  was  soon  programmed  into  the  termi¬ 
nal  Monitor.  Error  messages  were  tbo  automatically  translated  to  a 
more  understandable  English  message,  While  this  system  of  hacking 
down  the  source  wii  simple  It  had  the  proverbial  "Achilles'  heel"  that 
made  it  untrustworthy  and  potentially  dangerous.  A  program  could  be 
interrupted,  the  Monitor  and  its  subsystems  invoked,  and  then  the  pro¬ 
gram  could  be  resumed  at  the  point  of  interruption.  U  the  user  edited 
the  program  source  and  then  resumed  execution  of  the  object  code,  the 
source  pointers  in  the  object  code  were  potentially  invalid.  The  situa¬ 
tion  is  not  unlike  the  "dangling  reference"  problem  encountered  in 
block  structured  languages.9  In  short-lived  user  programs  this  turned 
out  occur  infrequently;  systems  programs  on  the  other  hand  might  be 
executing  the  same  object  code  for  weeks  while  the  source  was  being 
modified,  allowing  tource  and  object  files  which  differed  greatly  in 
content  and  date  of  last  modification.  Debugging  systems  programs 
then  brought  us  back  to  square  one. 
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Moit  Problems 

SYMBOL  provided  a  rather  inadequate  hardware  text  editor 
which  was  limited  to  specially  designed  terminals;  this  lead  to  the 
implementation  of  software  text  editors.  These  editors  initially  used 
SI’L  structures  (arrays)  for  storing  and  manipulating  text.  Since  the 
Translator  required  that  the  source  program  lx:  in  a  contiguous 
memory  string,  the  user's  source  program  was  copied  from  the  text 
structure  to  the  ntttnory  string  just  prior  to  translation.  Source 
pointers  generated  by  the  Translator  therefore  pointed  to  the  copy, 
requiring  a  search  through  the  structure  to  find  the  original  source  and 
Its  location.  Both  the  copying  and  searching  processes  were  painfully 
slow;  eventually  this  type  of  editor  was  replaced  with  one  which 
worked  directly  on  memory. 

Peoanqrifr 

A  rather  unique  approach  was  taken  to  solve  the  above  prob¬ 
lems;  a  program  was  cooetrwctcd  which  "de-complied'  SYMBOL  object 
code  back  lam  SPL  source  statements.  The  decompilation  process  was 
greatly  facilitated  because  the  SYMBOL  instruction  set  was  so  similar 
to  the  SPL  language  and  by  the  direct  and  simple  manner  in  which  the 
Translator  generated  object  code.  Decompilation  was  remarkably 
effective  in  re-creating  source;  in  most  instances  the  decompiled  state¬ 
ment  differed  from  the  original  source  only  in  minor  ways  such  as  the 
number  of  Hanks,  carriage  returns,  the  case  of  letters,  and  the  omis¬ 
sion  (in  the  decompiled  version)  of  redundant  parentheses. 

Execution  Error  Diagnoattes 

When  an  execution  error  occurred,  the  user  process  was 
suspended  and  the  System  Supervisor  invoked  the  Monitor  on  the 
appropriate  terminal.  Ute  of  the  decompile  program,  coupled  with  a 
program  to  interpret  data  values  on  the  evaluation  stack,  allowed 
excellent  diagnostics  to  be  given  entirely  in  terms  of  the  high  level 
source  program.  On  an  execution  error  the  Monitor  generated  the  fol¬ 
lowing: 

1.  Notification  that  an  execution  error  had  occurred 

2.  The  nature  of  the  error  (in  an  understandable  form). 

3.  The  statement  at  which  the  error  occurred. 

4.  An  arrow  beneath  the  source  line  pointing  to  the  particular 
operator  or  operand  causing  the  error. 

5.  if  the  error  Involved  s  monsdic  operator  then  its  operand 
was  printed,  if  the  error  involved  a  dyadic  operator  (hen 
bosh  operands  were  identified  and  primed. 

For  example,  division  by  aero  in  one  program  generated  the  following 
diagnostic: 

•••  EXECUTION  ERROR  (ZERO  DIVISOR  -  CODE  JU) 

IN  THE  FOLLOWING  STATEMENT: 

stl  »  qmin  •  (ml  •  m2  /  (|smms  /  rtw)  /  (mlmmn  t  (I  -  ID); 

t 

RIOHT  OPERAND 

I  00  I 

LEFT  OPERAND: 

midterm:  t  -  C6M  I 

MONITOR  IS  NOW  IN  (  ON  I  KOI 


At  this  point  the  Monitor  wuited  for  commands  from  the  user 
lutgical  choices  would  be  to  enter  the  text  editor  and  correct  the  prob 
lem  ot  to  enter  the  INOUIRL  subsystem  for  lurther  examination 

INQUIRE 

Knowing  the  source  line  and  operands  involved  in  an  error  is 
only  the  first  step  in  providing  good  high  level  language  debugging 
tools.  For  proper  debugging  we  feel  it  is  necessary  to  lx  able  to  exam¬ 
ine  the  contents  of  variables,  fo  examine  the  currently  active  calling 
sequence  down  to  the  source  line  invoking  the  call  of  each  active  pro¬ 
cedure,  and  to  examine  the  state  of  the  expression  evaluation  stack. 
Such  examination  should  he  available  after  an  execution  error  has 
occurred  or  at  any  time  during  the  normal  running  of  a  program.  This 
it  the  function  of  the  INQUIRE  subsystem.  The  INQUIRE  subsystem 
is  the  primary  meant  of  examining  the  user  program  variables  and 
block  structure  of  the  program.  When  entered,  the  “command 
environment"  it  set  to  the  block  that  was  in  execution  when  the  user 
program  was  interrupted. 

INQUIRE  responds  to  commands  from  the  user  tn  the  following 
way.  Entering  an  identifier  causes  the  value  of  thut  identifier  to  be 
primed,  providing  that  it  is  known  to  the  command  environment.  Spe¬ 
cial  qualifications  are  given  to  several  classes  of  variable#,  An  identif¬ 
ier  that  has  never  been  referenced  is  tagged  as  "unreferenced  null". 
Procedure  names,  labels  and  switch  components  are  tagged  only  ai  to 
type.  Parameters  are  identified  and  the  parameter  linking  is  followed 
to  the  calling  environment  to  reaoivc  the  parameter.  Global  variables 
are  identified  and  the  value  in  the  defining  environment  is  printed. 
Each  scalar  element  of  a  vector  is  printed  along  with  its  subscript  list. 
(Successive  nulls  are  grouped  together  in  an  attempt  to  save  paper.) 
Individual  elements  of  a  vector  can  alio  be  obtained.  All  identifiers 
known  to  a  block  can  he  obtained  with  the  DATA  command 

Identifiers  from  other  than  the  current  block  arc  available  us 
welt:  one  of  the  unique  features  of  INQUIRE  is  the  ways  in  which 
various  Hocks  can  he  traversed.  For  example,  the  value  of  un  identif¬ 
ier  in  the  bkxk  calling  the  Hock  of  the  current  command  environment 
is  obtained  by  preceding  the  identifier  name  with  an  "up-arrow"  char¬ 
acter  (i  or  ’).  This  specifies  that  the  identifier  is  to  be  looked  for  by 
going  out  one  level  from  the  command  environment,  according  to  the 
dynamic  nesting.  In  a  similar  manner,  any  number  of  dynamically 
nested  Hocks  may  be  traversed  by  preceding  the  identifier  name  with 
the  appropriate  number  of  up-arrows. 

The  static  program  netting  can  stso  be  used  to  specify  a  particu¬ 
lar  Hock.  Before  this  can  be  accomplished  however,  a  BLOCK  or 
PROCS  command  must  be  given.  The  BLOCK  command  prints  the 
static  Hock  structure  of  the  program  and  assigns  an  integer  value  to 
each  Hock.  This  number  provides  a  unique  naming  for  each  Hock. 
The  character  ">"  followed  by  one  of  these  integer  values  will  change 
the  current  command  environment  to  the  specified  Nock  number.  Set¬ 
ting  the  command  environment  to  a  Hock  which  was  not  a  member  of 
the  calling  sequence  allows  looking  at  static  vqrtnNes  hut  precludes 
obtaining  the  value#  of  any  formal  parameters  since  u  non-active  pro¬ 
cedure  has  no  calling  point  fix  parameter  linkage  Selection  of  a  non- 
active  hlixk  also  prohibits  use  of  the  up-urrow  commands,  la  addition 
to  printing  the  Mack  structure  of  a  program,  the  BLOCK  command 
prints  the  names  of  all  identifiers  used  in  each  Nock  and  an 


abbr  tinted  lag  as  lo  (heir  Jala  type,  e  g. .  scalar,  structure,  label,  pro¬ 
cedure  .  etc.  The  HROCS  command  is  similar  to  •*  SLOCK  com- 
...anc  ith  the  exception  that  it  prints  only  the  Na.»  structure. 

The  WHERE  command  locates  the  statement  in  execution, 
prints  it  on  the  console  device,  and  then  pauses.  Pressing  the  Con¬ 
tinue  button  vi1'.  -wu*  one  succeeding  statement  to  be  printed  before 
pa'<sing  again  This  sequence  is  exited  by  pressing  the  FI)  special  func¬ 
tion  button.  The  most  used  i  of  the  WHERE  command  is  in 

conjunction  with  the  “up-arrow  ,ure  described  previously,  A  com¬ 
mand  consisting  of  (WHERE  prints  the  statement  that  called  the  pro¬ 
cedure  in  execution,  and  thereby  reveals  the  name  of  the  procedure 
and  its  actual  parameters.  In  this  manner  the  calling  sequence  may  be 
examined  any  number  of  leveH  on  »  very  specific  baas  Since  xn 
entire  statement  is  primed  using  the  "'HERE  t  ,  inand,  a  more 
specific  referenc  is  needed  to  isolate  tl.c  .—.ct  point  of  execution.  A 
large  expression  may  contain  many  t  perators  and  ojerands,  for  exam¬ 
ple,  the  statement  in  the  diagnostic  of  the  previous  section  contained 
severe!  division  operations.  To  isolate  the  exact  point  of  execution  or 
error  a  pointer  is  print-- '  beneath  the  statement  directly  below  the 
appropriate  operand  or  operator. 

Examination  of  expressions  which  may  nave  been  partially 
evaluated  is  possible  using  the  STACK  command.  This  command 
prints  the  top  entry  of  the  slack  and  then  pauses.  Pressing  Continue 
prints  one  successive  stack  entry  and  tnen  pauses  again  if  the  bottom  of 
stack  has  not  been  reached.  Pressing  the  FI)  button  before  the  buitom 
of  stack  is  reached  will  cause  a  return  to  .he  cr-  .and  mode.  As  SPL 
is  a  Mock-Unctu-ed  language.  Iher.'  is  a  separate  stack  associated  with 
ecch  active  block.  The  stacks  of  other  active  procedures  are  accessed 
by  preceding  the  STACK  command  with  the  desired  number  of  up- 
arrows  or  by  first  entering  the  appropriate  Nock  via  the  “>"  command 

If  a  program  was  interrupted  by  pressing  the  internin'  ,.iy,  the 
program  may  be  resumed  al  the  point  of  interruption  by  using  the 
RESUME  command  or  at  a  label  by  using  the  GO  TO  command. 
The  GO  TO  command  has  the  restriction  that  the  label  must  be  in  a 
block  which  is  currently  active.  The  RESUME  command  may  not  be 
used  niter  an  execution  error  although  GO  TO  may  be  used  regardless 
of  the  cause  of  the  interrupt 

II  an  identifui  has  an  ON  block  associated  with  it.  that  ON 
Work  may  be  enabled  or  disabled  from  INOUIRE  A  mot:  detail 
description  of  ON  blocks  follows. 

A  brief  description  of  INOUIRE  commands  is  available  from  the 
terminal  with  the  HEl.P  command.  A  listing  of  the  HELP  text  is 
gtven  in  Appendix  I.  Appendix  2  shows  a  sample  terminal  session 
using  INOUIRE. 

ON  blocks 

ON  blocks  are  an  SFE  language  construct  extremely  useful  for 
dc' lugging  An  ON  block  is  similar  lo  a  procedure,  in  that  il  is  a 
scries  of  slalements  invoked  from  some  calling  point,  Unlike  pro¬ 
cedures.  however,  invocation  of  an  ON  Nock  is  caused  by  ihe 
occurrence  of  an  implicit  esenl  specified  by  a  list  of  names  following 
the  ON  declaration.  If  the  list  contains  a  variable  nan-..,  the  ON  Nock 
will  tv  invoked  immediately  after  an  assignment  to  that  variuNc 
xx-urs  It  the  list  contains  a  label.  Ihe  ON  block  will  be  invoked  upon 


encountering  a  GO  TO  sial  :mcnt  (o  that  label  before  the  transfer  actu¬ 
ally  takes  place.  If  the  list  contains  a  procedure  name,  the  ON  Nock 
will  be  invoked  upon  encountering  a  call  to  that  procedure  before 
entry  to  the  procedure  takes  place.  If  the  list  contains  the  word 
INTERRUPT,  the  ON  block  will  be  invoiced  when  the  user  presses 
one  of  Ihe  function  buttons  (FI  thru  FIS). 

The  ON  block  facility  bears  a  leaembUnce  to  the  PIV1  ON 
CFiECK  condition.  The  major  difference  is  that  in  SPL  multiple  ON 
blocks  are  allowed  to  exist  within  a  particular  environment  (scope)  and 
that  the  invocation  of  ON  Nocks  can  be  controlled  selectively  for  indi¬ 
vidual  identifiers.  The  IBM  PL/I(F)  compiler  makes  no  provision  for 
dynamically  enabling  or  disabling  the  CFIECK  condition,  and  while 
the  ON  CHECK  units  may  be  dynamically  switched  around,  such 
switching  applies  equally  to  all  variaNes  lo  which  the  CHECK  condi¬ 
tion  applies  In  SYMBOL  invocation  of  an  ON  block  for  a  particular 
identifier  is  controllable  by  Ihe  SPL  ENABLE  and  DISABLE  state¬ 
ments. 

A  typical  use  of  an  ON  block  it  shown  in  Figure  1,  which  illus¬ 
trates  a  method  to  discover  where  a  variable  is  assigned  undesired  (or 
desired)  values.  The  value  of  I  will  be  printed  every  time  it  it  modi¬ 
fied  and  the  user  can  then  decide  wheth  r  to  continue,  or  interrupt  his 
program  and  diagnose  further  with  INQUIRE.  Once  the  user  is  satis¬ 
fied  that  the  particular  portion  of  the  program  being  monitored  by  an 
ON  Nock  is  behaving  properly  the  ON  Nock  can  be  disabled  from 
INQUIRE.  The  implementation  is  a  major  advance  over  what  it  pos¬ 
sible  in  most  systems  in  that  no  extra  code  needt  to  be  generated  to 
invoke  an  ON  block  nor  does  Ihe  program  have  to  be  re-compiled  to 
turn  on  or  off  the  invocation  of  an  on  Nock.  This  has  major  benefits 
in  terms  of  execution  efficiency  and  the  ability  to  debug  non -nop  pro¬ 
grams,  not  to  speak  of  the  time  saved  in  editing  and  re-compiling  pro¬ 
grams  after  changing  the  debugging  options.  The  ON  Mock  it  also  a 
clean  way  of  debugging  a  program  in  that  it  concentrates  the  debug¬ 
ging  code  in  one  place,  in  contrast  to  spreading  debug  I/O  throughout 
a  program;  this  ptactically  eliminates  needing  to  “clean  up”  a  program 
after  debugging. 

ON  I;  NOTE  This  block  invoked  whenever  1  is  assigned  to; 

OLOilAL  1; 

OUTPUT  |  The  value  of  I  is  I,  !; 

PAUSE; 

END 

Figure  I .  Simple  ON  Mock 

The  descriptor  orientation  of  SYMBOL  was  a  map'  factor  in  the 
efficient  implementation  of  the  ON  Nock  facility.  Descriptors  were 
sixty-four  Nts  long  and  contained  sixteen  tag  bits  and  two  twenty-four 
Nt  address  fields.  An  identifier  with  an  amodated  ON  Mock  had  the 
left  address  field  pointing  to  the  identifier  value  and  the  right  address 
fi-.ld  pointing  to  the  sttrt  of  the  object  code  of  the  ON  Mock.  The 
ENABLE  and  DISABLE  slalements  either  set  or  read  and  "ON 
Enabled"  Nt  in  the  tag  field.  As  Ihe  descriptor  had  to  be  referenced 
for  every  identifier  reference,  checking  to  tee  if  an  identifier  had  an 
ON  Nock  aaocisted  with  it  could  be  done  in  parallel  with  normal 
accessing  without  loss  of  performance. 
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Evaluates 

To  a  large  extern  tl *  took  developed,  show  whet  was  easy  or  rea¬ 
sonable  to  do  with  die  SYMBOL  architecture.  Descriptors  and  type 
tag*  allowed  the  type  and  values  of  data  objects  to  be  easily  inter¬ 
preted.  The  additional  level  of  indirection  imposed  by  descriptors  was 
extremely  important  in  implemetvng  ON  blocks.  Being  able  to  selec¬ 
tively  Enable  or  Disable  ON  blocks  from  INQUIRE  or  dynamically  in 
the  users  program  without  recompilation  drastically  reduced  the  compi¬ 
lations  and  editing  that  might  otherwise  have  been  required.  Some 
credit  has  to  be  given  to  the  designers  of  the  SYMBOL  language  for 
introducing  ON  blocks  with  Enable  and  Disable  statements. 

Decompilation  is  s  subject  which  requires  several  comments. 
First,  it  mum  be  rested  that  we  had  almoat  no  control  over  the 
instruction  set  or  the  code  generated  by  the  Translator.  While  we 
could  have  generated  better  code  with  s  software  compiler,  experimen¬ 
tation  proved  a  software  compiler  to  be  to  be  impractical  because  of  its 
slow  speed.  Fortunately,  the  high  level  instruction  set  and  ample 
code  generation  algorithms  made  object  code  relatively  easy  to  invert. 
On  the  negative  side,  decompilation  was  not  trivial  (some  900  lines  of 
code),  nor  was  it  fast  (3  to  10  seconds/Matcment).  Decompilation  has 
several  other  negative  characteristics.  Starting  to  decompile  from  the 
middle  of  control  flow  instructions  (eg.  if-then-ciae,  looping,  procedure 
body)  made  decompiling  the  bottom  part  of  the  (low  syntax  difficult; 
this  could  have  been  much  easier  if,  for  example,  the  jump  over  an 
"else"  clausa  had  botn  distinct  from  other  jumps.  Comments  and 
declarations  generated  no  code,  and  hence  would  never  re-appear  in  a 
decompiled  program.  The  minor  differences  in  number  of  hianks,  car¬ 
riage  return,  and  cam  of  letttn  were  very  irritating  when  trying  to 
find  the  "tan  nurce  tine  in  an  editor  by  using  an  exact  string  search. 
On  the  whole,  if  one  has  control  over  the  compiler  there  exist  much 
better  techniques  for  mapping  object  code  beck  into  source  state¬ 
ments.4'7  Decompilation  was  used  in  our  case  because  we  had  few 
other  options. 


Users  of  the  SYMBOL  system  ware  very  pleased  with  the  pro¬ 
gramming  and  debugging  environment;  in  particular  with  ‘he  way 
INQUIRE  allowed  the  investigation  of  their  block  structured  pro- 
g.ams.  The  software  debugging  tools  were  the  finishing  touch  in  mak¬ 
ing  SYMBOL  a  High  Level  Language  Computer  System,1 6  rather  than 
just  a  machine  with  a  fancier  instruction  set.  The  disappointing  part 
for  ex-utert  of  the  SYMBOL  system  is  that  there  are  no  inherent  rea¬ 
sons  why  similar  features  could  not  be  provided  even  on  low  level 
language  machines,  yet  such  debugging  systems  are  not  appearing. 
What  the  SYMBOL  architecture  did  ftv  ut  was  mtke  the  job  of  build¬ 
ing  some  of  our  took  taker  than  woui.  h.  re  been  possible  on  a  more 
traditional  machine. 
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Appeadix  1.  INQUIRE  Help  Text 


This  is  the  Inquiry  subsystem,  which  permits  examination  of  user- program  variables  and  Mock 
structure.  The  following  inputs  are  accepted: 

1 .  An  identifier.  The  value  of  the  identifier  will  be  printed,  if  possible.  Otherwise  an  appropri¬ 
ate  message  will  be  produced.  “Identifier"  here  indudes  LIMIT  and  LIMITED. 

2.  “LIMn-n”  where  n  is  a  number  between  0  and  99.  The  value  of  LIMIT  is  set  accordingly. 

3.  The  character  “>”  followed  by: 

a.  a  number  obtained  from  the  output  produced  by  the  /BLOCK  command  (see  below), 

b.  a  string  of  one  or  more  “ »”  characters,  or 

c.  nothing. 

This  respedfies  the  command  environment  as: 
in  case  a.  the  specified  Mock. 

in  case  b.  one  Mock  out  from  the  current  setting  for  each  “t”  in  the  string  (following 
(he  dynamic  nesting,  i.e.,  the  order  of  activation), 

in  case  c.  the  environment  which  was  current  when  the  Monitor  was  invoked. 

When  the  Inquiry  mode  is  entered,  case  c.  is  assumed.  In  case  b.  if  the  current  command 
environment  was  not  active  when  the  Monitor  was  invoked,  a  message  is  printed  and  the 
command  environment  is  not  changed. 

4.  The  character  "/”  followed  by  a  command  keyword.  Only  enough  of  the  keyword  to  distin¬ 
guish  it  from  all  others  is  required.  The  keywords  are  described  in  the  following  paragraphs: 

5.  "/BLOCK”.  The  static  Mock  structure  of  the  user  program  is  printed,  each  Mock  it  identi¬ 
fied  to  the  extent  possiMe.  the  names  and  attributes  of  all  identifier*  known  in  each  block  are 
listed,  and  each  Mock  is  assigned  a  reference  number  for  use  in  setting  the  r-xnmand  environ¬ 
ment  (see  paragraph  3).  The  current  command  environment  and  the  Mock*  which  were 
active  when  the  Monitor  was  invoked  are  identified.  The  lilting  may  be  terminated  by  press¬ 
ing  PO. 

b.  "/DATA".  The  values  of  all  identifiers  known  in  the  current  command  environment  are 
printed,  similarly  to  paragraph  1 .  To  cancel,  press  PO. 

7.  “/WHERE",  If  the  specified  Mock  was  active  when  the  Monitor  was  invoked,  a  reconstruc¬ 
tion  of  the  statement  which  was  being  executed  will  be  displayed,  and  the  Monitor  will  pause. 
Pressing  CONTINUE  will  evoke  consecutive  statements;  pressing  FO  will  direct  the  Monitor 
to  input  a  new  Inquiry  command. 

X.  "/STACK".  If  the  specified  Mock  was  active  when  the  Monitor  was  invokeu,  the  top  item 
on  its  stack  will  be  displayed  similarly  to  paragraph  1 .  Press ng  CONTINUE  will  display  suc¬ 
cessive  stack  items;  pressing  FO  will  direct  the  Monitor  to  input  a  new  Inquiry  command. 

9.  "/ENABLE".  An  identifier  is  requested,  and  the  ON-Mock  associated  with  the  identifier  is 
enabled.  If  the  identifier  does  not  have  an  ON-Mock,  an  appropriate  meswge  is  produoed. 

10  "/DISABLE”.  Behaves  similarly  to  paragraph  9,  but  the  ON-Mock  is  disabled. 

11.  "/GOTO”.  A  label  is  requested,  and  user  program  execution  is  resumed  at  that  point.  The 
label  must  be  in  an  environment  which  was  active  when  the  Monitor  was  invoked. 

12.  “/MONITOR".  Return  to  Monitor. 

13.  "/EDIT".  Equivalent  to  "/MONITOR”  followed  by  "EDIT'. 

14  "/RESUME".  Equivalent  to  "/MONITOR"  followed  by  “RESUME". 

15  “/PROCS".  Similar  to  “/BLOCK",  but  does  not  list  the  identifiers  in  each  Mock. 

Any  of  the  above  inputs  may  be  prefixed  by  one  or  more  “t"  characters.  This  will  cause  the  oom- 
mund  environment  to  be  respecified  as  in  paragraph  3b,  but  for  that  one  input  only, 
if  FO  is  pressed  while  in  input,  the  input  is  ignored. 
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Appendix  2.  Example  Program  and  Execution 


NOTE  Demonstration  program.  Keywords  capitalized.  User  input  italicized; 

Ittum  1 1 1  Jmim  1 2 1  Console  1 1 1  NOTE  Initial  value  statements; 

Vector  <<  1 1 2 1 3  >  <  One  I  Two  I  TTiree  >  <  123,456.78 1 3x3  Matrix  I  >>; 

Vector] 345.6]  -  lA  Scalar  string.!; 

Repeat  Scan: 

OUTPUT  I  What  it  UneA  ?  t  INPUT  LineA; 

OUTPUT  I  What  it  UneB  ? !;  INPUT  LineB; 
perform  lexical  scan(  LineA.  LineB,  Stmnt  ); 

Until(  lnum  EQUALS  Jnum  ,  Repeat  Scan  ); 

PROCEDURE  WhichRoutinc(  name  ): 

OLOBAL  lnum,  Jnum; 

IF  name  SAME  I  PARSE  I 
THEN  RETURN  1; 

ELSE  IF  name  AFTER  I M 1 

THEN  lnum  -  20;  RETURN  2; 

ELSE  Jnum  -  3;  RETURN  3; 

END  END 
END 

PROCEDURE  Perform  lexical  scan(  Stringl,  String2.  Statement  ); 

SWITCH  Routine<  Routine  1 1  Routlne2 1  Routine3  >; 

SI  -  No  Blanks!  Stringl  ); 
target  -  WhiciiRoutlne(  String2  ); 

GO  TO  Routine]  target  |; 

Routinel:  SUtement  -  (  SI  FORM,  rl*DOD.IFD|MASKl4SA.FCl)  JOIN  I  Long  t 

RETURN; 

Routine2:  Statement  -  teet(  Stringl  BEFORE  String2  AND  Target  EQUALS  2  ); 

RETURN; 

PROCEDURE  te*t(  booiop  ); 

OUTPUT  |  Paused  ,  t 
PAUSE; 

IF  boolop  THEN  RETURN  t)  ELSE  RETURN  I  END 
END 

PROCEDURE  NoBlankR  line  ); 

BLOCK 
test  -  5; 

END 

RETURN  line  MASK  I  FA  k 
END 

END  NOTE  End  of  Perform  lexical  scan; 

PROCEDURE  Until]  Condition  ,  Label  ); 

IF  Condition  THEN  RETURN  ELSE  GO  TO  Label  END 
END 

ON  Jnum; 

GLOBAL  Jnum,  lnum,  Console) 

IF  Jnum  EQUALS  lnum  OR  lnum  GREATER  THAN  17 
THEN  OUTPUT  TO  Console,  I  Error  Detected  -  Jnum  Invalid  L I  Paused,  t 
PAUSE, 

END 

END 
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Input  Am  mar  b  Hahctaod. 
Comma  art  b  AUhti. 


run 

Whol  is  LineA 
493.08 

What  is  Un«B  ? 

TZBI23 

Paused. 

(At  tMs  potest  lbs  user  prsstis  the  Interrupt  key.) 

MONITOR  IS  NOW  IN  CONTROL. 

^inquire 

! where 
PAUSE  ; 

i 

IF  bootop  THEN 
RETURN  0: 

Uwhtre 

Rouiine2:  Statement  -  test(Stringl  BEFORE  Strlng2  AND  target  EQUAL  2): 


Iprocs 

MADS  PROGRAM  -  1  ACTIVE  (LEVEL  1) 

ON  BL  OCK  FOR  inum  -  2 
PROCEDURE  Until  -  3 

PROCEDURE  perform  lexical  scan  -  4  ACTIVE  (LEVEL  2) 
PROCEDURE  NoBUnlu  -  5 
INNER  BLOCK  -  6 

PROCEDURE  test  -  7  ACTIVE  (LEVEL  3) 

••INTERRUPT  IN  THIS  BLOCK** 

••CURRENT  COMMAND  ENVIRONMENT*  * 

PROCEDURE  WhichRoutine  -  8 

string  I 

“STRING  1"  IS  NOT  KNOWN  IN  TOE  SPECIFIED  ENVIRONMENT. 
Hiring! 

STRING  1:  formal  parameter. 

The  actual  parameter  it  “LineA"  (in  the  calling  environment). 

UneA:  1493.08! 

islaitmtm 

STATEMENT:  formal  parameter. 

The  actual  parameter  is  "Stmnt"  (in  the  calling  environment). 

Stmnt:  null 

/rtsume 

What  is  LineA  ? 

353.937 

What  is  LineB  1 
AIB2C3. 

Error  Detected  -  Jnum  Invalid 
Pauaed. 

(User  praaMt  interrupt,) 

MONITOR  IS  NOW  IN  CONTROL 

tin quirt 
Iwhrrt 
PAUSE  ; 

t 

'/u*rre 
Jnum  -  3: 

(User  presets  Cantinas  for  another  stares  Una.) 

RETURN  3; 

iwwhrrr 

target  -  WhichRoutine(String2): 
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i  mi  where 

perform  lexical  scan(LineA.  LineB,  Stmnt); 


mil  where 

ATTEMPT  TO  BACK  UP  BEYOND  OUTERMOST  BLOCK 


Iproes 

MAIN  PROGRAM  =  I  ACTIVE  (LEVEL  1) 

ON  BLOCK  FOR  Jnum  =  2  ACTIVE  (LEVEL  4) 

••INTERRUPT  IN  THIS  BLOCK** 

••CURRENT  COMMAND  ENVIRONMENT** 

PROCEDURE  Until  =  3 

PROCEDURE  perform  lexical  scan  =  4  AITIVE  (LEVEL  2) 

PROCEDURE  NoBlanks  -  5 
INNER  BLOCK  -  (> 

PROCEDURE  test  =  7 

PROCEDURE  WhichRoutine  =  K  ACTIVE  (LEVEL  3) 

.'return  t 

*•*  EXECUTION  ERROR  TOO"  TO  NON-LABEL  CODE  H2) 

IN  THE  FOLLOWING  STATEMENT 

GO  TO  Routine|  target  |. 

i 

OPERAND: 

null 

MONITOR  IS  NOW  IN  CONTROL. 

'inquire 

larger 

TARGET:  |3| 
routine/ J/ 

ROUTINE[3|:  “Routtnc3"  (sw'ch  component | 
routlneJ 

ROUTINE3;  null 
/edit 

dearth  FOR  "Routine:  "  FROM  LINE  / 

3ft  :  Roulinc2:  Statement  •  tcst(  String!  BEFORE  St ring2  AND  Target  EQUALS  2  l. 
7 insert  AFTER  LINE  >  2 


routined.  Mol  t.  adding  tint  I  ode  after  dl.tt  men  ol  misting  label.  Rl  I'tIRN 
7 inquire 

t go  TO:  LABEL  Routine! 


What  is  LineA  7 

wv.m 

What  is  LineB  7 

(User  pretees  Interrupt.) 

MONITOR  IS  NOW  IN  CONTROL. 

'/inquire 

Iblock 

MAIN  PROGRAM  -  1  ACTIVE  (LEVEL  1) 
••INTERRUPT  IN  THiS  BLOCK* * 
••CURRENT  COMMAND  ENVIRONMENT  ** 


Inum.  Jnum(ON).  Console.  Vector(S).  Repeat  Sctia(l  |.  l.ineA.  LineB. 
perform  lexical  ttcan(Pr),  Stmnl,  Uittil(Pr),  WliivhRnutiiie(Pr) 


ON  BLOC  K  FOR  Jniun  *  2 
Jnum(G.ON),  lnum(G),  Consote(O) 


PROCEDURE  Until  =  3 
Condition! Pa ),  Lahcl(Pu) 
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PROCEDURE  perform  lexical  wan  «  4 

j.  Stringl(Pi),  Slring2(Pa),  Statement(Pa),  Rouiine(S),  Routinel(L), 

Routine2(L),  Routinc3,  SI,  NoBlanlu(Pr),  urfet,  WhichRoutine(G.Pr). 
j.  leat(Pr) 

PROCEDURE  NoBlanks  -  5 

>  line(Pa) 

INNER  BLOCK  -  6 
test 


PROCEDURE  lest  -  7 
bookrp(Pa) 

PROCEDURE  WhichRoutine  =  8 
name(Pa),  lnum(G).  Jnum(G.ON) 

ital 

••TEST'  IS  NOT  KNOWN  IN  THE  SPECIFIED  ENVIRONMENT 

>4 

leu 

TEST:  procedure. 

>6 

Itu 

TEST:  15  I 

>« 

Hutu 

name:  (omul  parameter. 

Inum:  global.  In  the  defining  environment, 

Inum:  120 ! 

Jnum:  global  In  the  defining  environment, 

Jnum:  13 1 

>1 

Hum 

Inum:  1 20 1 
inum:  1 3 1 
Console:  111 
Vector 

Ml:  111  (1.2):  )2l  [I.SJ-  ,j| 

2.1  :  I  One  I  [2,21:  iTwot  (2.3):  I  Three  I 
3,1]:  Il23.456.7bl  [3,2)'  l3x3Matrial  13,3]:  I  I 

4-344]:  nulls 

345,1-5]:  nulls  [345.6]:  I A  Scalar  trng.  I 

Repeat  Scan:  label. 

LtneA:  1999.0891 

UneB:  I AIB2CT  I 

perform  lexical  wan:  procedure. 

Stmnt:  I  $555001121  .ong  I 
Until:  procedure, 

WhlchRoutlnc:  procedure, 

IdUMt:  IDENTIFIER  -  Jnum 
ItmMf,  IDENTIFIER  -  repeat  iron 
NO  ON  BLOCK  FOR  REPEAT  SCAN 

'monitor 

turmlnol* 

END  OF  TERMINAL  SESSION 
PROCESSING  TIME:  24.2  SEC, 

DURATION  OF  SESSION;  60  2  MIN 

PRESS  CTRL-0  TO  START 
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